Paper reading party (ICCV 2023): End-to-End Semi-Supervised Object Detection with Soft Teacher

ICCV paper reading party: End-to-End Semi-Supervised Object Detection with Soft
Teacher Kazuya Nishimura

Introduce paper End-to-End Semi-Supervised Object Detection with Soft Teacher Small
amount of labeled data (image + bounding box) Unlabeled data (image) paper url: [Xu+, ICCV 2021] dog semi-supervised object detection Object detector Bounding box + class How to get object detector by effectively using unlabeled data? 1

Introduce paper End-to-End Semi-Supervised Object Detection with Soft Teacher Small
amount of labeled data (image + bounding box) Unlabeled data (image) paper url: [Xu+, ICCV 2021] dog semi-supervised object detection Object detector Bounding box + class How to get object detector by effectively using unlabeled data? Use Soft Teacher !! 2

Why select?  Various relationships between teachers and students Teacher
model (Deep) Student model (shallow) e.g.) Knowledge distillation Teacher model Student model (same or more deep) e.g.) classification with noisy label (weakly-supervised) Transfer knowledge [Wang+, TPAMI 2021] [Xia+, CVPR 2020] e.g.) semi-supervised object detection (This presentation) generate noisy label Imitate teacher Outperform teacher Student model (same) Update weight by using student weight Teacher model [Xu+, ICCV 2021] [Peng+, ICCV 2021] Honda will introduce This paper 3

Background: object detection e.g.) Faster RCNN: 2 stage object detector
Input image Feature extractor Feature Region proposal network Proposal of bounding location (overlap is ok) 4

Classification head Input image Feature extractor Feature Region proposal network Proposal of bounding location (overlap is ok) ... Regression head crop ො 𝑦𝑟𝑒𝑔 object position (x, y, w, h) ො 𝑦𝑐𝑙𝑠 Class 5

Classification head Input image Feature extractor Feature Region proposal network Proposal of bounding location (overlap is ok) ... Regression head crop ො 𝑦𝑟𝑒𝑔 object position (x, y, w, h) ො 𝑦𝑐𝑙𝑠 Class 𝐿𝑜𝑠𝑠 = 𝐿𝑐𝑙𝑠 𝑦𝑐𝑙𝑠 , ො 𝑦𝑐𝑙𝑠 + 𝐿𝑟𝑒𝑔 (𝑦𝑟𝑒𝑔 , ො 𝑦𝑟𝑒𝑔 ) Classification loss Regression loss 6

Related work: Semi-supervised learning  Consistency-based method  Pseudo-labeling-based method
7 … … [Jeong+, Neurips 2019] [Tang+, WACV 2021] Pretrained with labeled data … … Pseudo label Pseudo-labeling with Soft Teacher! Consistency loss

Related work: Mean Teacher (EMA teacher) 8 Student model (same)
𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑜𝑠𝑠𝑢 [Tarvainen+, NeurIPS 2018]

Student model (same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 +
1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑜𝑠𝑠𝑢 Related work: Mean Teacher (EMA teacher) 9 Calculate loss based on Consistency or pseudo label [Tarvainen+, NeurIPS 2018]

Proposed method: overview 10 = 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢
Classification loss Regression loss Student model (same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑜𝑠𝑠𝑢

Proposed method: overview 11 Student model (same) 𝜃𝑡 𝑡𝑒𝑎 =
𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢

Proposed method: overview 12 Weak aug. Strong aug. Student model
(same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢

Proposed method: overview 13 threshold < score Weak aug. Strong
aug. Student model (same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢

Proposed method: overview 14 threshold < score Weak aug. Strong
aug. Student model (same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢

(same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢 1. Soft teacher Weighting with score 𝑠𝑗 σ 𝑘=1 𝑁𝑏𝑔 𝑠𝑘

(same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 + 1 − 𝛼 𝜃𝑡−1 𝑡𝑒𝑎 Teacher model dog 𝐿𝑜𝑠𝑠𝑠 + 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢 2. Box Jittering Filtering with regression variance

The recall and precision of selected box are 33% and
89% 1. Soft teacher 17 Teacher model threshold < score 𝒢𝑓𝑔 , 𝒢𝑏𝑔 Pseudo foreground box list Pseudo background box list 𝐿𝑐𝑙𝑠 𝑢 = 1 𝑁 𝑏 𝑓𝑔 ෍ 𝑖=0 𝑁 𝑏 𝑓𝑔 𝑙𝑐𝑙𝑠 (𝑏 𝑖 𝑓𝑔, 𝒢𝑓𝑔) + 1 𝑁 𝑏 𝑏𝑔 ෍ 𝑗=0 𝑁 𝑏 𝑏𝑔 𝑙𝑐𝑙𝑠 (𝑏 𝑗 𝑏𝑔, 𝒢𝑏𝑔) The model tend to estimate background!

The recall and precision of selected box are 33% and
89% 1. Soft teacher 18 Teacher model threshold < score 𝒢𝑓𝑔 , 𝒢𝑏𝑔 Pseudo foreground box list Pseudo background box list 𝐿𝑐𝑙𝑠 𝑢 = 1 𝑁 𝑏 𝑓𝑔 ෍ 𝑖=0 𝑁 𝑏 𝑓𝑔 𝑙𝑐𝑙𝑠 (𝑏 𝑖 𝑓𝑔, 𝒢𝑓𝑔) + ෍ 𝑗=0 𝑁 𝑏 𝑏𝑔 𝑤𝑗 𝑙𝑐𝑙𝑠 (𝑏 𝑖 𝑏𝑔, 𝒢𝑏𝑔) 𝑤𝑖 = 𝑟𝑖 σ 𝑘=1 𝑁 𝑏 𝑏𝑔 𝑟𝑘 The weight is calculated by the background score

2. Box Jittering  Localization accuracy don’t have good correlation
19 Ideal If we select high score sample, the sample accuracy is not accurate

20 Ideal If we select high score sample, the sample accuracy is not accurate

 Want to get more correlated index ➢ Introduce box regression variance ! 21 Ideal If we select high score sample, the sample accuracy is not accurate

How to calculate box regression variance? 22 Classification head Input
image Feature extractor Feature RPN ... Regression head crop BBox candidates Add jittering

How to calculate box regression variance? 23 Classification head Input
image Feature extractor Feature RPN ... Regression head crop BBox candidates Add jittering

How to calculate box regression variance? 24 RPN Regression CNN

How to calculate box regression variance? 25 RPN Add jittering
Regression CNN

How to calculate box regression variance? 26 RPN Add jittering
𝑏1 𝑏2 𝑏3 𝑏4 Regression CNN ത 𝜎 = 1 4 σ𝑘=1 4 𝜎𝑘 > 0.5 𝑏0 (𝑏1− 𝑏𝑎𝑣𝑒)^2 + (𝑏2− 𝑏0)^2 + (𝑏3− 𝑏0)^2 +

Experiments: dataset  MS COCO semi-supervised benchmark ➢118k training data
+ 123k unlabeled data ➢1%, 5%, and 10% images is randomly sampled from training data ➢5 fold cross validation with mAP 27 Method 1% 5% 10% Supervised (only use training data) 10.0 ± 0.26 20.92 ± 0.15 26.94 ± 0.111 Sohn+, arXiv 2020 13.97 ± 0.35 24.38 ± 0.12 28.64 ± 0.21 Proposed 20.46 ± 0.39 30.74 ± 0.08 34.04 ± 0.14

Ablation study  MS COCO 10% labeled data  End2End
28 Soft Teacher Box jittering m-AP 31.2 33.6 34.2 Method m-AP Supervised 27.1 Multi-stage 28.7 E2E 31.2

Summary  Proposed E2E semi-supervised object detection  two techniques
is used for training ➢Soft teacher that effectively transfer teacher info. ➢Box jittering that can omit inaccurate pseudo-label  Outperforms SoTA methods on MS COCO benchmark  Take home message ➢Teacher and student framework is used for various task 29

Paper reading party (ICCV 2023):End-to-End Sem...

Paper reading party (ICCV 2023): End-to-End Semi-Supervised Object Detection with Soft Teacher

Kazuya Nishimura

More Decks by Kazuya Nishimura

Other Decks in Research

Featured

Transcript

ICCV paper reading party: End-to-End Semi-Supervised Object Detection with Soft

Introduce paper End-to-End Semi-Supervised Object Detection with Soft Teacher Small

Introduce paper End-to-End Semi-Supervised Object Detection with Soft Teacher Small

Why select?  Various relationships between teachers and students Teacher

Background: object detection e.g.) Faster RCNN: 2 stage object detector

Background: object detection e.g.) Faster RCNN: 2 stage object detector

Background: object detection e.g.) Faster RCNN: 2 stage object detector

Related work: Semi-supervised learning  Consistency-based method  Pseudo-labeling-based method

Related work: Mean Teacher (EMA teacher) 8 Student model (same)

Student model (same) 𝜃𝑡 𝑡𝑒𝑎 = 𝛼 𝜃𝑡 𝑠𝑡𝑢 +

Proposed method: overview 10 = 𝐿𝑐𝑙𝑠 𝑢 + 𝐿𝑟𝑒𝑔 𝑢

Proposed method: overview 11 Student model (same) 𝜃𝑡 𝑡𝑒𝑎 =

Proposed method: overview 12 Weak aug. Strong aug. Student model

Proposed method: overview 13 threshold < score Weak aug. Strong

Proposed method: overview 14 threshold < score Weak aug. Strong

Proposed method: overview 15 Weak aug. Strong aug. Student model

Proposed method: overview 16 Weak aug. Strong aug. Student model

The recall and precision of selected box are 33% and

The recall and precision of selected box are 33% and

2. Box Jittering  Localization accuracy don’t have good correlation

2. Box Jittering  Localization accuracy don’t have good correlation

2. Box Jittering  Localization accuracy don’t have good correlation

How to calculate box regression variance? 22 Classification head Input

How to calculate box regression variance? 23 Classification head Input

How to calculate box regression variance? 24 RPN Regression CNN

How to calculate box regression variance? 25 RPN Add jittering

How to calculate box regression variance? 26 RPN Add jittering

Experiments: dataset  MS COCO semi-supervised benchmark ➢118k training data

Ablation study  MS COCO 10% labeled data  End2End

Summary  Proposed E2E semi-supervised object detection  two techniques

Paper reading party (ICCV 2023): End-to-End Sem...

Paper reading party (ICCV 2023): End-to-End Semi-Supervised Object Detection with Soft Teacher

More Decks by Kazuya Nishimura

Other Decks in Research

Featured

Transcript

Paper reading party (ICCV 2023):End-to-End Sem...