Slide 1

Slide 1 text

.PEFMJOH1PJOH $MPVEXJUI 4FMG"UUFOUJPOBOE(VNCFM4VCTFU 4BNQMJOH +JBODIFOH :BOH 2JBOH ;IBOH #JOHCJOH /J -JOHVP -J +JOYJBO -JV .FOHEJF ;IPV 2J5JBO 4IBOHIBJ+JBP5POH6OJWFSTJUZ .PF,FZ-BCPG"SUJGJDJBM*OUFMMJHFODF "**OTUJUVUF 4IBIHIBJ +JBP5POH 6OJWFSTJUZ )VBXFJ/PBIT "SL-BC $713

Slide 2

Slide 2 text

֓ཁ , Mengdie Zhou Qi Tian anzmd}@sjtu.edu.cn, [email protected] ai Jiao Tong University ence, AI Institute, Shanghai Jiao Tong University wei Noahs Ark Lab ortant ed by ention louds. sing a to re- ate its ermu- cs de- pling) by, we Furthest Point Sampling Group Shuffle Attention Gumbel Subset Sampling Outlier Attention Weights w nsx2 GO KPO 5SC GXPSL 3 6G PKO 2 GO KPO ?SCOTHPS GS x 0I 9 A D 8 nq x d C 6D C ty r dT k

Slide 3

Slide 3 text

എܠ - T/ RKK C EP A FCK CFCL OK K G T F (.GC ) -H )E FF wifs} f y / f p dT GY 2 GSC G PPMKO :9 x x : M K 7GCF 2 GO KPOx S f p o _ t f T C DC h luy

Slide 4

Slide 4 text

ఏҊख๏ུ֓ . 3D C C DC 6 8CG D t p nq C DC 2 TPM G COF =GMC K G PTK KPO G GFFKO SP T HHMG C GO KPO 0I 9A I9G 8CEA C E I 8 DC C 8 8C v C D C t

Slide 5

Slide 5 text

0 0 4 0 4 4 0 0 4 ! = #$, #&, … , #(, … , #) c f e U !( * = (#(, #, − #(), | , ≠ ( Figure 2. Point Attention Transformer architecture for classification (to

Slide 6

Slide 6 text

0 0 4 0 4 4 0 0 4 ! = #$, #&, … , #(, … , #) c f e U !( * = (#(, #, − #(), | , ≠ ( Nearest-neighbor Graph Figure 2. Point Attention Transformer architecture for classification (to 1234 56 = 7 ∘ max ℎ 5* 5* ∈ >6 * } 7, ℎ ∶ AB3 CDEℎ FGHIJ KHGL 2= 4x f?P 8 x

Slide 7

Slide 7 text

3 0 0 4 43 b Uln e aM 0 DIE G I A C DC p Group Linear Shuffle GSA Self-Attention

Slide 8

Slide 8 text

3 0 0 4 ( Group Linear Point 2 Point 1 Point 3 M x a w NO : N/R 5 = 5S, 5T, . . , 5VW,5VWXS,5VWXT,, . . , 5V 0 5YVWXZ [ = 1, … , NO)| D = 0, … , R − 1

Slide 9

Slide 9 text

3 0 0 4 ) Dot-Product In-Group Attention Group Linear Group1 Group2 Group3 Point 2 Non-linearity Point 1 Point 3 a xK x muw2 GO KPO 1EE^_ `, > ∶ a `, > b > FGHIJ1EE^ > ∶ NH^NcE(1EE^_(>Y, >Y) >Y = >(Y)dY}YeS,…,O dY ∶ a x x { dY ∈ ℝVW×VW

Slide 10

Slide 10 text

3 0 0 4 Dot-Product In-Group Attention Group Norm Group Linear Channel Shuffle Group1 Group2 Group3 Point 2 Non-linearity + Point 1 Point 3 3 COOGM HHMGx FGHIJ1EE^ > dh FGHIJ1EE^ > a tx { xq

Slide 11

Slide 11 text

3 0 0 4 Dot-Product In-Group Attention Group Norm Group Linear Channel Shuffle Group1 Group2 Group3 Point 2 Non-linearity + Point 1 Point 3 Fa1 0 FK(h FGHIJ1EE^ > + >) PKO w PKO x }qp luk

Slide 12

Slide 12 text

0 0 4 , Group Linear Shuffle GSA GSA Down Sampling Element-wise Classification Loss Segmentation Loss Segmentation Classification Self-Attention GSA GSA & Down Sampling Repeating for i times Group1 Group2 Group3 Shuffled Group MLP MLP ... MLP MLP MLP Nearest-neighbor Graph Figure 2. Point Attention Transformer architecture for classification (top branch) and segmentation (bottom branch). The input points are first embedded into high-level representations through an Absolute and Relative Position Embedding (ARPE) module, resulting in some points representative (bigger in the figure). In classification, the features alternately pass through Group Shuffle Attention (GSA) blocks and down-sampling blocks, either Furthest Point Sampling (FPS), or our Gumbel Subset Sampling (GSS). In segmentation, only GSA layers are used. Finally, a shared MLP is connected to every point, followed by an element-wise classification loss or segmentation loss for training. 3MCTTKHKEC KPO w ns g w_ x w nvf GS C KPOw nvf C MKO fqfd xq

Slide 13

Slide 13 text

0 0 4 - Input FC Layer Transposed Gumbel Noise 1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$

Slide 14

Slide 14 text

0 0 4 . Input FC Layer Transposed Gumbel Noise 1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ u fv sP fv s fqf S 0I 9A D 8 x

Slide 15

Slide 15 text

0 op 5 op(5) 0 3 r qx q ~ sHELc5(op 5 )t g e nq y vxt_ y tt e T

Slide 16

Slide 16 text

0 op 5 = {vw, vS, … , vx} x b c qY = exp( ⁄ log(vY) + RY ) ∑Zew x exp( ⁄ log(vZ) + RZ ) RY ~ − log(− log Å^DoHGL 0,1 :  ∶ a a e q = sHoELc5(( ⁄ log op 5 + R ) | Rw rs w z c tb / T PL K KP 3C G PSKECM =G CSC G GSK C KPO XK 6 GM PH CY

Slide 17

Slide 17 text

ճαϯϓϦϯάͨ͠ࡍͷ෼෍ j=1 exp((log(⇡j) + gj)/⌧) he density of the Gumbel-Softmax distribution (derived in Appendix B) is: p⇡,⌧ (y1, ..., yk) = (k)⌧k 1 k X i=1 ⇡i/y⌧ i ! k k Y i=1 ⇡i/y⌧+1 i his distribution was independently discovered by Maddison et al. (2016), where it is referred e concrete distribution. As the softmax temperature ⌧ approaches 0, samples from the Gum oftmax distribution become one-hot and the Gumbel-Softmax distribution becomes identical to tegorical distribution p(z). expectation a) Categorical category sample b) = 0.1 = 0.5 = 1.0 = 10.0 gure 1: The Gumbel-Softmax distribution interpolates between discrete one-hot-encoded cate C softmax op 5 x sHoELc5(( ⁄ log op 5 + R ) x e a a / OG 7P a a / T/ CSYK PS FH )) FH

Slide 18

Slide 18 text

0 0 4 ( Input FC Layer Transposed Gumbel Noise 1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ ÖÜÜ !, = Máàâäã(å!, ç) !, å ∈ ℝ),n$×j S fv p Um

Slide 19

Slide 19 text

0 0 ) 6 8 C 8 8 ) 6 G 8 8 ) r - .b ln yx 3D C ::I 8: b

Slide 20

Slide 20 text

O GI D NP Method Points Accuracy (%) DeepSets [51] 5,000 90.0 PointNet [30] 1,024 89.2 Kd-Net [19] 1,024 90.6 PointNet++ [32] 1,024 90.7 KCNet [34] 1,024 91.0 DGCNN [42] 1,024 92.2 PointCNN [23] 1,024 92.2 PAT (GSA only) 1,024 91.3 PAT (GSA only) 256 90.9 PAT (FPS) 1,024 91.4 PAT (FPS + GSS) 1,024 91.7 Table 1. Classification performance on ModelNet40 dataset. T A in P ev

Slide 21

Slide 21 text

4 MA NP Method Size Time Accuracy (%) PointNet [30] 40 25.3 89.2 PointNet++ [32] 12 163.2 90.7 DGCNN [42] 21 94.6 92.2 PAT (GSA only) 5 132.9 91.3 PAT (FPS) 5 87.6 91.4 PAT (FPS + GSS) 5.8 88.6 91.7 Table 2. Model size (”Size”, MB), forward time (”Time”, ms) and Accuracy on ModelNet40 dataset. in speed with low-level implemental optimization. Note the

Slide 22

Slide 22 text

0 0 0 0 4 , f f C 8 DC .8 8G DA : DGG 8A 8 DCb 2 8C E :A8GG 1D7p

Slide 23

Slide 23 text

Method mIoU mIoU on Area 5 Size (MB) RSNet [13] 56.47 - - SPGraph [20] 62.1 58.04 - PointNet [30] 47.71 47.6 4.7 DGCNN [42] 56.1 - 6.9 PointCNN [23] 65.39 57.26 46.2 PAT 64.28 60.07 6.1 Table 3. 3D semantic segmentation results on S3DIS. Mean per- class IoU (mIoU, %) is used as evaluation metric. Model sizes are obtained using the official codes. To further analyze the performance between PointCNN Fig stre 'PMEͰܭࢉͨ͠N*P6Ͱ͸Ұ൪ߴ͍ -

Slide 24

Slide 24 text

·ͱΊ . f p m3D C C DC 6 8CG D d v se m P0I 9 A I9G 8 EA C f S c f y re m e T S f c p l