modeling_point_cloud.pdf

.PEFMJOH1PJOH $MPVEXJUI 4FMG"UUFOUJPOBOE(VNCFM4VCTFU 4BNQMJOH +JBODIFOH :BOH 2JBOH ;IBOH #JOHCJOH /J
-JOHVP -J +JOYJBO -JV .FOHEJF ;IPV 2J5JBO 4IBOHIBJ+JBP5POH6OJWFSTJUZ .PF,FZ-BCPG"SUJGJDJBM*OUFMMJHFODF "**OTUJUVUF 4IBIHIBJ +JBP5POH 6OJWFSTJUZ )VBXFJ/PBIT "SL-BC $713

֓ཁ , Mengdie Zhou Qi Tian anzmd}@sjtu.edu.cn, tian.qi1@huawei.com ai Jiao
Tong University ence, AI Institute, Shanghai Jiao Tong University wei Noahs Ark Lab ortant ed by ention louds. sing a to re- ate its ermu- cs de- pling) by, we Furthest Point Sampling Group Shuffle Attention Gumbel Subset Sampling Outlier Attention Weights w nsx2 GO KPO 5SC GXPSL 3 6G PKO 2 GO KPO ?SCOTHPS GS x 0I 9 A D 8 nq x d C 6D C ty r dT k

എܠ - T/ RKK C EP A FCK CFCL OK
K G T F (.GC ) -H )E FF wifs} f y / f p dT GY 2 GSC G PPMKO :9 x x : M K 7GCF 2 GO KPOx S f p o _ t f T C DC h luy

ఏҊख๏ུ֓ . 3D C C DC 6 8CG D t
p nq C DC 2 TPM G COF =GMC K G PTK KPO G GFFKO SP T HHMG C GO KPO 0I 9A I9G 8CEA C E I 8 DC C 8 8C v C D C t

0 0 4 0 4 4 0 0 4 !
= #$, #&, … , #(, … , #) c f e U !( * = (#(, #, − #(), | , ≠ ( Figure 2. Point Attention Transformer architecture for classiﬁcation (to

0 0 4 0 4 4 0 0 4 !
= #$, #&, … , #(, … , #) c f e U !( * = (#(, #, − #(), | , ≠ ( Nearest-neighbor Graph Figure 2. Point Attention Transformer architecture for classiﬁcation (to 1234 56 = 7 ∘ max ℎ 5* 5* ∈ >6 * } 7, ℎ ∶ AB3 CDEℎ FGHIJ KHGL 2= 4x f?P 8 x

3 0 0 4 43 b Uln e aM 0
DIE G I A C DC p Group Linear Shuffle GSA Self-Attention

3 0 0 4 ( Group Linear Point 2 Point
1 Point 3 M x a w NO : N/R 5 = 5S, 5T, . . , 5VW,5VWXS,5VWXT,, . . , 5V 0 5YVWXZ [ = 1, … , NO)| D = 0, … , R − 1

3 0 0 4 ) Dot-Product In-Group Attention Group Linear
Group1 Group2 Group3 Point 2 Non-linearity Point 1 Point 3 a xK x muw2 GO KPO 1EE^_ `, > ∶ a `, > b > FGHIJ1EE^ > ∶ NH^NcE(1EE^_(>Y, >Y) >Y = >(Y)dY}YeS,…,O dY ∶ a x x { dY ∈ ℝVW×VW

3 0 0 4 Dot-Product In-Group Attention Group Norm Group
Linear Channel Shuffle Group1 Group2 Group3 Point 2 Non-linearity + Point 1 Point 3 3 COOGM HHMGx FGHIJ1EE^ > dh FGHIJ1EE^ > a tx { xq

3 0 0 4 Dot-Product In-Group Attention Group Norm Group
Linear Channel Shuffle Group1 Group2 Group3 Point 2 Non-linearity + Point 1 Point 3 Fa1 0 FK(h FGHIJ1EE^ > + >) PKO w PKO x }qp luk

0 0 4 , Group Linear Shuffle GSA GSA Down
Sampling Element-wise Classification Loss Segmentation Loss Segmentation Classification Self-Attention GSA GSA & Down Sampling Repeating for i times Group1 Group2 Group3 Shuffled Group MLP MLP ... MLP MLP MLP Nearest-neighbor Graph Figure 2. Point Attention Transformer architecture for classification (top branch) and segmentation (bottom branch). The input points are first embedded into high-level representations through an Absolute and Relative Position Embedding (ARPE) module, resulting in some points representative (bigger in the figure). In classification, the features alternately pass through Group Shuffle Attention (GSA) blocks and down-sampling blocks, either Furthest Point Sampling (FPS), or our Gumbel Subset Sampling (GSS). In segmentation, only GSA layers are used. Finally, a shared MLP is connected to every point, followed by an element-wise classification loss or segmentation loss for training. 3MCTTKHKEC KPO w ns g w_ x w nvf GS C KPOw nvf C MKO fqfd xq

0 0 4 - Input FC Layer Transposed Gumbel Noise
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$

0 0 4 . Input FC Layer Transposed Gumbel Noise
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ u fv sP fv s fqf S 0I 9A D 8 x

0 op 5 op(5) 0 3 r qx q ~
sHELc5(op 5 )t g e nq y vxt_ y tt e T

0 op 5 = {vw, vS, … , vx} x
b c qY = exp( ⁄ log(vY) + RY ) ∑Zew x exp( ⁄ log(vZ) + RZ ) RY ~ − log(− log Å^DoHGL 0,1 : ∶ a a e q = sHoELc5(( ⁄ log op 5 + R ) | Rw rs w z c tb / T PL K KP 3C G PSKECM =G CSC G GSK C KPO XK 6 GM PH CY

ճαϯϓϦϯάͨ͠ࡍͷ෼෍ j=1 exp((log(⇡j) + gj)/⌧) he density of the Gumbel-Softmax
distribution (derived in Appendix B) is: p⇡,⌧ (y1, ..., yk) = (k)⌧k 1 k X i=1 ⇡i/y⌧ i ! k k Y i=1 ⇡i/y⌧+1 i his distribution was independently discovered by Maddison et al. (2016), where it is referred e concrete distribution. As the softmax temperature ⌧ approaches 0, samples from the Gum oftmax distribution become one-hot and the Gumbel-Softmax distribution becomes identical to tegorical distribution p(z). expectation a) Categorical category sample b) = 0.1 = 0.5 = 1.0 = 10.0 gure 1: The Gumbel-Softmax distribution interpolates between discrete one-hot-encoded cate C softmax op 5 x sHoELc5(( ⁄ log op 5 + R ) x e a a / OG 7P a a / T/ CSYK PS FH )) FH

0 0 4 ( Input FC Layer Transposed Gumbel Noise
1/τ Softmax Dot-Product ( Ni , c ) ( Ni+1 , Ni ) ( Ni+1 , Ni ) ( Ni+1 , c ) × + (b) Gumbel Subset Sampling Annealing τ → 0+ τ = 1 ntion. The core representation Instead, we use a hard and discrete selection w to-end trainable gumbel softmax (Eq. 3): y gumbel = gumbel softmax(wXT i ) · X i , w 2 in training phase, it provides smooth gradients crete reparameterization trick. With annealing, ates to a hard selection in test phase. A Gumbel Subset Sampling (GSS) is simply point version of Eq. 13, which means a distribut sets, GSS(X i ) = gumbel softmax(WXT i )·X i , W The following proposition theoretically gua permutation-invariance of GSS. !, ∈ ℝ), × j !, * = klm !, ∈ ℝ), × ),n$ ÖÜÜ !, = Máàâäã(å!, ç) !, å ∈ ℝ),n$×j S fv p Um

0 0 ) 6 8 C 8 8 ) 6
G 8 8 ) r - .b ln yx 3D C ::I 8: b

O GI D NP Method Points Accuracy (%) DeepSets [51]
5,000 90.0 PointNet [30] 1,024 89.2 Kd-Net [19] 1,024 90.6 PointNet++ [32] 1,024 90.7 KCNet [34] 1,024 91.0 DGCNN [42] 1,024 92.2 PointCNN [23] 1,024 92.2 PAT (GSA only) 1,024 91.3 PAT (GSA only) 256 90.9 PAT (FPS) 1,024 91.4 PAT (FPS + GSS) 1,024 91.7 Table 1. Classiﬁcation performance on ModelNet40 dataset. T A in P ev

4 MA NP Method Size Time Accuracy (%) PointNet [30]
40 25.3 89.2 PointNet++ [32] 12 163.2 90.7 DGCNN [42] 21 94.6 92.2 PAT (GSA only) 5 132.9 91.3 PAT (FPS) 5 87.6 91.4 PAT (FPS + GSS) 5.8 88.6 91.7 Table 2. Model size (”Size”, MB), forward time (”Time”, ms) and Accuracy on ModelNet40 dataset. in speed with low-level implemental optimization. Note the

0 0 0 0 4 , f f C 8
DC .8 8G DA : DGG 8A 8 DCb 2 8C E :A8GG 1D7p

Method mIoU mIoU on Area 5 Size (MB) RSNet [13]
56.47 - - SPGraph [20] 62.1 58.04 - PointNet [30] 47.71 47.6 4.7 DGCNN [42] 56.1 - 6.9 PointCNN [23] 65.39 57.26 46.2 PAT 64.28 60.07 6.1 Table 3. 3D semantic segmentation results on S3DIS. Mean per- class IoU (mIoU, %) is used as evaluation metric. Model sizes are obtained using the ofﬁcial codes. To further analyze the performance between PointCNN Fig stre 'PMEͰܭࢉͨ͠N*P6Ͱ͸Ұ൪ߴ͍ -

·ͱΊ . f p m3D C C DC 6 8CG
D d v se m P0I 9 A I9G 8 EA C f S c f y re m e T S f c p l

modeling_point_cloud.pdf

modeling_point_cloud.pdf

koki madono

More Decks by koki madono

Other Decks in Research

Featured

Transcript

.PEFMJOH1PJOH $MPVEXJUI 4FMG"UUFOUJPOBOE(VNCFM4VCTFU 4BNQMJOH +JBODIFOH :BOH 2JBOH ;IBOH #JOHCJOH /J

֓ཁ , Mengdie Zhou Qi Tian anzmd}@sjtu.edu.cn, tian.qi1@huawei.com ai Jiao

എܠ - T/ RKK C EP A FCK CFCL OK

ఏҊख๏ུ֓ . 3D C C DC 6 8CG D t

0 0 4 0 4 4 0 0 4 !

0 0 4 0 4 4 0 0 4 !

3 0 0 4 43 b Uln e aM 0

3 0 0 4 ( Group Linear Point 2 Point

3 0 0 4 ) Dot-Product In-Group Attention Group Linear

3 0 0 4 Dot-Product In-Group Attention Group Norm Group

3 0 0 4 Dot-Product In-Group Attention Group Norm Group

0 0 4 , Group Linear Shuffle GSA GSA Down

0 0 4 - Input FC Layer Transposed Gumbel Noise

0 0 4 . Input FC Layer Transposed Gumbel Noise

0 op 5 op(5) 0 3 r qx q ~

0 op 5 = {vw, vS, … , vx} x

ճαϯϓϦϯάͨ͠ࡍͷ෼෍ j=1 exp((log(⇡j) + gj)/⌧) he density of the Gumbel-Softmax

0 0 4 ( Input FC Layer Transposed Gumbel Noise

0 0 ) 6 8 C 8 8 ) 6

O GI D NP Method Points Accuracy (%) DeepSets [51]

4 MA NP Method Size Time Accuracy (%) PointNet [30]

0 0 0 0 4 , f f C 8

Method mIoU mIoU on Area 5 Size (MB) RSNet [13]

·ͱΊ . f p m3D C C DC 6 8CG