Reading Circle (Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation)

Slide 1

Slide 1 text

Reading Circle Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation [Wang+, CVPR20]

Slide 2

Slide 2 text

Semantic Segmentation Assign a semantic category to each pixel. 2 Supervised Dataset : image & pixel-level class label ✗ huge annotation cost ↓ Weakly-supervised Dataset : image & image-level class label

Slide 3

Slide 3 text

Weakly-Supervised SS Methods Three steps to train on the image-level label. 1. predict an initial category-wise response map to localize the object 2. refine the initial response as the pseudo GT 3. train the segmentation network based on pseudo labels 3 1 2 3

Slide 4

Slide 4 text

Slide 5

Slide 5 text

What’s New Introduce a self-supervised equivariant attention mechanism (SEAM). - Narrow the supervision gap between fully and weakly supervised semantic segmentation 5

Slide 6

Slide 6 text

What’s New Introduce a self-supervised equivariant attention mechanism (SEAM). - Focus on affine transformation 6 Previous Proposed

Slide 7

Slide 7 text

What’s New Introduce a self-supervised equivariant attention mechanism (SEAM). - Focus on affine transformation 7 Previous Proposed CAM varies depending on the size of the input image

Slide 8

Slide 8 text

What’s New Introduce a self-supervised equivariant attention mechanism (SEAM). - Focus on affine transformation 8 Previous Proposed Consistent CAM regardless of the size of the input image

Slide 9

Slide 9 text

Network Architecture of SEAM Use Siamese Network and three kinds of Losses. 9

Slide 10

Slide 10 text

Network Architecture of SEAM Use Siamese Network and three kinds of Losses. 10

Slide 11

Slide 11 text

Pixel Correlation Module (PCM) Modify CAM by self attention mechanism. 11

Slide 12

Slide 12 text

Pixel Correlation Module (PCM) Modify CAM by self attention mechanism. 12 “Non-local neural networks” [Wang+, CVPR18]

Slide 13

Slide 13 text

Self Attention [Wang+, CVPR18] Non-local mean operation x : input signal y : output signal g : representation function f : similarity function (scalar) 13 Gaussian Embedded Gaussian Dot product Concatenation

Slide 14

Slide 14 text

Self Attention [Wang+, CVPR18] Non-local block 14

Slide 15

Slide 15 text

Pixel Correlation Module (PCM) Modify CAM by self attention mechanism. 15 “Non-local neural networks” [Wang+, CVPR18]

Slide 16

Slide 16 text

Pixel Correlation Module (PCM) Modify CAM by self attention mechanism. 16

Slide 17

Slide 17 text

Network Architecture of SEAM Use Siamese Network and three kinds of Losses. 17

Slide 18

Slide 18 text

Loss Design of SEAM 1. Class Loss : multi-label soft margin loss 18 original CAM

Slide 19

Slide 19 text

Network Architecture of SEAM Use Siamese Network and three kinds of Losses. 19

Slide 20

Slide 20 text

Loss Design of SEAM 2. Equivariant Regularization (ER) Loss 20 Consistency between before and after affine transformation

Slide 21

Slide 21 text

Network Architecture of SEAM Use Siamese Network and three kinds of Losses. 21

Slide 22

Slide 22 text

Loss Design of SEAM 2. Equivariant Cross Regularization (ECR) Loss 22 to further improve the ability of network for equivariance learning

Slide 23

Slide 23 text

Network Architecture Use Siamese Network and three kinds of Losses. 23

Slide 24

Slide 24 text

Dataset PASCAL VOC 2012 semantic segmentation benchmark 21 categories one or multiple object class 1,464 images in training set 1,449 images in validation set 1,456 images in test set 24

Slide 25

Slide 25 text

Result -CAM- 25 Proposed Baseline GT Original

Slide 26

Slide 26 text

Result -Semantic Segmentation- 26 Input GT Ours

Slide 27

Slide 27 text

Quantitative Comparison Evaluation of transformations on equivariant regularization - evaluation metric : ↑ mIoU (%) 27

Slide 28

Slide 28 text

Quantitative Comparison Category performance comparison - evaluation metric : ↑ mIoU (%) 28

Slide 29

Slide 29 text

Quantitative Comparison Evaluation of WSSS performance 29 [Chang+, CVPR20] val : 66.1 test : 65.9

Slide 30

Slide 30 text

Conclusion Weakly-supervised Learning 1. generate pseudo GT label 2. apply supervised learning Weakly-supervised Semantic Segmentation Self-supervised Equivariant Attention Mechanism ↓ appropriate CAM ↓ better pseudo GT label 30