CCC winter 2022 フカヨミ半教師あり学習

郁青 Qing YU The University of Tokyo フカヨミ半教師あり学習少ないラベル付きデータでも深層学習！

Self-introduction • Name: 郁青 Yu Qing • Hometown: Shanghai,
China • Affiliation: Aizawa Laboratory, D3 Graduate School of Information Science and Technology The University of Tokyo • Research interest: Open-set recognition Semi-supervised learning Domain adaptation • HP: yu1ut.com 2

Semi-supervised Learning 3 Labeled Data Unlabeled Data Dog Cat Training
Data Test Data CNN

Semi-supervised Learning 4 # Labeled Data # Unlabeled Data Dog
Cat Training Data Test Data CNN <<

Semi-supervised Learning 5 # Labeled Data # Unlabeled Data Dog
Cat Training Data Test Data CNN << Dog Cat Pseudo Label

Methods for Semi-supervised Learning • Consistency regularization • Π-model [Rasmus+,
NeurIPS 15] • Temporal Ensembling [Laine+, ICLR 17] • Mean Teacher [Tarvainen+, NeurIPS 17] • State-of-the-art methods • MixMatch [Berthelot+, NeurIPS 19] • ReMixMatch [Berthelot+, ICLR 20] • FixMatch [Sohn+, NeurIPS 20] 6

Consistency regularization • Stochastic augmentation • Random crop • Horizontal
flip • Gaussian Noise 8 Random Crop Random Crop Consistency: Dog

Consistency regularization 9 Dropout Dropout • Dropout Intermediate feature CNN
CNN Consistency: Dog Output Output

10 0.1 0.2 0.6 0.1 random augmentation1 Π-model [Rasmus+, NeurIPS
15] 0.2 0.1 0.7 0.0 random augmentation2 CNN w/ dropout CNN w/ dropout Same network 𝜃𝜃 𝜃𝜃

11 random augmentation1 Π-model [Rasmus+, NeurIPS 15] 0.2 0.1 0.7
0.0 random augmentation2 CNN w/ dropout CNN w/ dropout Mean squared error (MSE) loss back prop unlabeled 0.1 0.2 0.6 0.1

12 random augmentation1 Π-model [Rasmus+, NeurIPS 15] 0.2 0.1 0.7
0.0 random augmentation2 CNN w/ dropout CNN w/ dropout back prop 0 0 1 0 Cross-entropy loss back prop Groundtruth Label labeled Mean squared error (MSE) loss 0.1 0.2 0.6 0.1

13 random augmentation Temporal Ensembling [Laine+, ICLR 17] CNN w/
dropout MSE loss back prop Ensemble output unlabeled 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

dropout Ensemble output Update: 𝑍𝑍𝑖𝑖+1 = 𝑍𝑍𝑖𝑖 × 𝛼𝛼 + 𝑧𝑧𝑖𝑖 × 1 − 𝛼𝛼 𝑧𝑧𝑖𝑖 Temporal Ensembling unlabeled 𝑍𝑍𝑖𝑖 0.1 0.2 0.6 0.1

dropout MSE loss back prop Ensemble output labeled back prop 0 0 1 0 Groundtruth Label Cross-entropy loss 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

16 CNN random augmentation1 random augmentation2 Teacher CNN Mean Teacher
[Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

17 CNN random augmentation1 random augmentation2 Teacher CNN More accurate
result Mean Teacher [Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

18 CNN random augmentation1 random augmentation2 More accurate result MSE
loss back prop Mean Teacher [Tarvainen+, NeurIPS 17] Teacher CNN 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

result MSE loss back prop 0 1 0 0 back prop Groundtruth Label Train: Mean Teacher [Tarvainen+, NeurIPS 17] Cross-entropy loss labeled 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

20 CNN random augmentation1 random augmentation2 More accurate result MSE
loss back prop How to generate teacher CNN? Mean Teacher [Tarvainen+, NeurIPS 17] Teacher CNN 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

result same structure Mean Teacher [Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0

result exponential moving average of past iterations 𝜃𝜃 𝜃𝜃′ Mean Teacher [Tarvainen+, NeurIPS 17] α = 0.999 𝜃𝜃𝑡𝑡 ′ = 𝜃𝜃𝑡𝑡−1 ′ × α + 𝜃𝜃𝑡𝑡−1 × 1 − α 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0 Parameters of CNN

Experiment • Network: • 13-layer ConvNet • Dataset: 23 CIFAR-10
SVHN CIFAR-10 #Labeled #Unlabeled #Test 1,000 49,000 10,000 2,000 48,000 4,000 46,000 50,000 0 SVHN #Labeled #Unlabeled #Test 250 73,007 26,032 500 72,757 1,000 72,257 73,257 0

Result (CIFAR-10) 24 1,000 labeled 2,000 labeled 4,000 labeled 50,000
labeled Π-model N/A N/A 12.36 ± 0.31 5.56 ± 0.10 Temporal Ensembling N/A N/A 12.16 ± 0.31 5.60 ± 0.10 Supervised-only 46.43 ± 1.21 33.94 ± 0.73 20.66 ± 0.57 5.82 ± 0.15 Π-model (Replicated) 27.36 ± 1.20 18.02 ± 0.60 13.20 ± 0.27 6.06 ± 0.11 Mean Teacher 21.55 ± 1.48 15.73 ± 0.31 12.31 ± 0.28 5.94 ± 0.15 • Error rate (%)

Result (SVHN) 25 250 labeled 500 labeled 1,000 labeled 73,257
labeled Π-model N/A 6.65 ± 0.53 4.82 ± 0.17 2.54 ± 0.04 Temporal Ensembling N/A 5.12 ± 0.13 4.42 ± 0.16 2.74 ± 0.06 Supervised-only 27.77 ± 3.18 16.88 ± 1.30 12.32 ± 0.95 2.75 ± 0.10 Π-model (Replicated) 9.69 ± 0.92 6.83 ± 0.66 4.95 ± 0.26 2.50 ± 0.07 Mean Teacher 4.35 ± 0.50 4.18 ± 0.27 3.95 ± 0.19 2.50 ± 0.05 • Error rate (%)

State-of-the-art methods • Stochastic augmentation • Random crop, Horizontal flip,
Gaussian Noise • Auto augmentation [Cubuk+, CVPR 19][Lim+, NeurIPS 19][Cubuk+, CVPRW 20] 27 AutoAugment [Cubuk+, CVPR 19] Weak Augmentation Strong Augmentation

State-of-the-art methods • Stochastic augmentation • Random crop, Horizontal flip,
Gaussian Noise • Auto augmentation [Cubuk+, CVPR 19][Lim+, NeurIPS 19][Cubuk+, CVPRW 20] • MixUp [Zhang+, ICLR 18] 28 Label: cat 0.7 * dog + 0.3 * cat Label: dog

MixMatch [Berthelot+, NeurIPS 19] 29 CNN Unlabeled Sample random augmentation
CNN Pseudo label • Label guessing

MixMatch [Berthelot+, NeurIPS 19] 30 Mixup Unlabeled Sample Pseudo label:
dog Labeled Sample cat Label: 0.7 * dog + 0.3 * cat CNN • Training [0.0, 1.0] [1.0, 0.0] [0.7, 0.3] Mixed Image Mixed Label

ReMixMatch [Berthelot+, ICLR 20] 31 CNN weak augmentation pseudo label
strong augmentation • Label guessing Unlabeled Sample

ReMixMatch [Berthelot+, ICLR 20] 32 Mixup Unlabeled Sample Pseudo label:
dog Labeled Sample cat Label: 0.7 * dog + 0.3 * cat CNN • Training

• Training (Rotation predicting) ReMixMatch [Berthelot+, ICLR 20] 33 Unlabeled
Sample Rotate: 0°, 90°, 180°, 270° CNN 90°

FixMatch [Sohn+, NeurIPS 20] 34 CNN weak augmentation strong augmentation
• Training Unlabeled Sample prediction pseudo label one-hot CNN CNN CNN prediction prediction prediction back prop Cross-entropy loss same network same network

Experiment • Problems • Different hyperparameters • Different neural networks
• Different deep learning frameworks -> How to compare the methods fairly? 35

Experiment • Realistic Evaluation of Semi-Supervised Learning [Oliver+, NeurIPS 18]
• Same model: Wide ResNet-28-2 [Zagoruyko+, BMVC 16] • Same training protocol (optimizer, learning rate schedule, data preprocessing) • Same code base • Same hyperparameter optimization • Dataset: 36 CIFAR-10 #Labeled #Unlabeled #Test 40 49,960 10,000 250 49,750 1,000 49,000 CIFAR-100 #Labeled #Unlabeled #Test 400 49,600 10,000 2,500 47,500 10,000 40,000 SVHN #Labeled #Unlabeled #Test 40 73,217 26,032 250 73,007 1,000 72,257

Result (CIFAR-10) 37 40 labeled 250 labeled 4,000 labeled Π-model
N/A 54.26 ± 3.97 14.01 ± 0.38 Mean Teacher N/A 32.32 ± 2.30 9.19 ± 0.19 MixMatch 47.54 ± 11.50 11.05 ± 0.86 6.42 ± 0.10 ReMixMatch 19.10 ± 9.64 5.44 ± 0.05 4.72 ± 0.13 FixMatch (RA) 13.81 ± 3.37 5.07 ± 0.65 4.26 ± 0.05 FixMatch (CTA) 11.39 ± 3.35 5.07 ± 0.33 4.31 ± 0.15 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20]

Result (CIFAR-100) 38 400 labeled 2,500 labeled 10,000 labeled Π-model
N/A 57.25 ± 0.48 37.88 ± 0.11 Mean Teacher N/A 53.91 ± 0.57 35.83 ± 0.24 MixMatch 67.61 ± 1.32 39.94 ± 0.37 28.31 ± 0.33 ReMixMatch 44.28 ± 2.06 27.43 ± 0.31 23.03 ± 0.56 FixMatch (RA) 48.85 ± 1.75 28.29 ± 0.11 22.60 ± 0.12 FixMatch (CTA) 49.95 ± 3.01 28.64 ± 0.24 23.18 ± 0.11 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20] ReMixMatch

Result (SVHN) 39 40 labeled 250 labeled 1,000 labeled Π-model
N/A 18.96 ± 1.92 7.54 ± 0.36 Mean Teacher N/A 3.57 ± 0.11 3.42 ± 0.07 MixMatch 42.55 ± 14.53 3.98 ± 0.23 3.50 ± 0.28 ReMixMatch 3.34 ± 0.20 2.92 ± 0.48 2.65 ± 0.08 FixMatch (RA) 3.96 ± 2.17 2.48 ± 0.38 2.28 ± 0.11 FixMatch (CTA) 7.65 ± 7.65 2.64 ± 0.64 2.36 ± 0.19 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20]

Recent methods • Dash [Xu+, ICML 21] • CoMatch [Li+,
ICCV 21] • FlexMatch [Zhang+, NeurIPS 21] • SimMatch [Zheng+, CVPR 22] 40

Recent topics of Semi-supervised Learning • The performance in benchmarks
≠ The performance in real-world 41

Open-set Semi-supervised Learning 42 # Labeled Data # Unlabeled Data
Dog Cat Training Data Test Data CNN << Dog + Cat

Open-set Semi-supervised Learning 43 # Labeled Data # Unlabeled Data
Dog Cat Training Data Test Data CNN << Dog + Cat + Outlier damage

Open-set Semi-supervised Learning 44 Dog Cat Training Data Test Data
CNN Detect outliers in unlabeled data when labeled data is limited damage # Labeled Data # Unlabeled Data <<

Methods for Open-set Semi-supervised Learning • Multi-Task Curriculum Framework [Yu+,
ECCV 20] • OpenMatch [Saito+, NeurIPS 21] • Trash to Treasure [Huang+, ICCV 21] • Safe-Student [He+, CVPR 22] 45

Multi-Task Curriculum Framework [Yu+, ECCV 20] 46 CNN Class Label
Outlier Score • Multi-Task Learning

Multi-Task Curriculum Framework [Yu+, ECCV 20] 47 • Framework Detect
outliers Semi-supervised learning without outliers

Result 48 CIFAR10 + Outlier (LSUN) MixMatch MTC No outlier
Supervised

≠ The performance in real-world • vs self-supervised learning • When the dataset is large enough, fine-tuning self-supervised models is enough? 49

≠ The performance in real-world • vs self-supervised learning • When the dataset is large enough, fine-tuning self-supervised models is enough? • vs zero-shot prediction (e.g. CLIP) • Is CLIP enough for most datasets? 50

Conclusion • Semi-supervised learning is proposed to leverage unlabeled data
to train a model when only limited labeled data is available. • Recent methods can train a high performance model even when only 40 labeled samples are labeled. • Self-supervised learning and pretrained vision-and-language models are challenging semi-supervised learning. 51

CCC winter 2022 フカヨミ半教師あり学習

CCC winter 2022 フカヨミ半教師あり学習

More Decks by YUI

Featured

Transcript