Tong University ence, AI Institute, Shanghai Jiao Tong University wei Noahs Ark Lab ortant ed by ention louds. sing a to re- ate its ermu- cs de- pling) by, we Furthest Point Sampling Group Shuffle Attention Gumbel Subset Sampling Outlier Attention Weights w nsx2 GO KPO 5SC GXPSL 3 6G PKO 2 GO KPO ?SCOTHPS GS x 0I 9 A D 8 nq x d C 6D C ty r dT k
Group1 Group2 Group3 Point 2 Non-linearity Point 1 Point 3 a xK x muw2 GO KPO 1EE^_ `, > ∶ a `, > b > FGHIJ1EE^ > ∶ NH^NcE(1EE^_(>Y, >Y) >Y = >(Y)dY}YeS,…,O dY ∶ a x x { dY ∈ ℝVW×VW
Sampling Element-wise Classification Loss Segmentation Loss Segmentation Classification Self-Attention GSA GSA & Down Sampling Repeating for i times Group1 Group2 Group3 Shuffled Group MLP MLP ... MLP MLP MLP Nearest-neighbor Graph Figure 2. Point Attention Transformer architecture for classification (top branch) and segmentation (bottom branch). The input points are first embedded into high-level representations through an Absolute and Relative Position Embedding (ARPE) module, resulting in some points representative (bigger in the figure). In classification, the features alternately pass through Group Shuffle Attention (GSA) blocks and down-sampling blocks, either Furthest Point Sampling (FPS), or our Gumbel Subset Sampling (GSS). In segmentation, only GSA layers are used. Finally, a shared MLP is connected to every point, followed by an element-wise classification loss or segmentation loss for training. 3MCTTKHKEC KPO w ns g w_ x w nvf GS C KPOw nvf C MKO fqfd xq
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ u fv sP fv s fqf S 0I 9A D 8 x
b c qY = exp( ⁄ log(vY) + RY ) ∑Zew x exp( ⁄ log(vZ) + RZ ) RY ~ − log(− log Å^DoHGL 0,1 : ∶ a a e q = sHoELc5(( ⁄ log op 5 + R ) | Rw rs w z c tb / T PL K KP 3C G PSKECM =G CSC G GSK C KPO XK 6 GM PH CY
distribution (derived in Appendix B) is: p⇡,⌧ (y1, ..., yk) = (k)⌧k 1 k X i=1 ⇡i/y⌧ i ! k k Y i=1 ⇡i/y⌧+1 i his distribution was independently discovered by Maddison et al. (2016), where it is referred e concrete distribution. As the softmax temperature ⌧ approaches 0, samples from the Gum oftmax distribution become one-hot and the Gumbel-Softmax distribution becomes identical to tegorical distribution p(z). expectation a) Categorical category sample b) = 0.1 = 0.5 = 1.0 = 10.0 gure 1: The Gumbel-Softmax distribution interpolates between discrete one-hot-encoded cate C softmax op 5 x sHoELc5(( ⁄ log op 5 + R ) x e a a / OG 7P a a / T/ CSYK PS FH )) FH
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ ÖÜÜ !, = Máàâäã(å!, ç) !, å ∈ ℝ),n$×j S fv p Um
56.47 - - SPGraph [20] 62.1 58.04 - PointNet [30] 47.71 47.6 4.7 DGCNN [42] 56.1 - 6.9 PointCNN [23] 65.39 57.26 46.2 PAT 64.28 60.07 6.1 Table 3. 3D semantic segmentation results on S3DIS. Mean per- class IoU (mIoU, %) is used as evaluation metric. Model sizes are obtained using the official codes. To further analyze the performance between PointCNN Fig stre 'PMEͰܭࢉͨ͠N*P6ͰҰ൪ߴ͍ -