Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CCC winter 2022 フカヨミ半教師あり学習

YUI
December 13, 2022
260

CCC winter 2022 フカヨミ半教師あり学習

CCC winter 2022 で講演した半教師あり学習についての資料

YUI

December 13, 2022
Tweet

Transcript

  1. Self-introduction • Name: 郁 青 Yu Qing • Hometown: Shanghai,

    China • Affiliation: Aizawa Laboratory, D3 Graduate School of Information Science and Technology The University of Tokyo • Research interest: Open-set recognition Semi-supervised learning Domain adaptation • HP: yu1ut.com 2
  2. Semi-supervised Learning 5 # Labeled Data # Unlabeled Data Dog

    Cat Training Data Test Data CNN << Dog Cat Pseudo Label
  3. Methods for Semi-supervised Learning • Consistency regularization • Π-model [Rasmus+,

    NeurIPS 15] • Temporal Ensembling [Laine+, ICLR 17] • Mean Teacher [Tarvainen+, NeurIPS 17] • State-of-the-art methods • MixMatch [Berthelot+, NeurIPS 19] • ReMixMatch [Berthelot+, ICLR 20] • FixMatch [Sohn+, NeurIPS 20] 6
  4. Methods for Semi-supervised Learning • Consistency regularization • Π-model [Rasmus+,

    NeurIPS 15] • Temporal Ensembling [Laine+, ICLR 17] • Mean Teacher [Tarvainen+, NeurIPS 17] • State-of-the-art methods • MixMatch [Berthelot+, NeurIPS 19] • ReMixMatch [Berthelot+, ICLR 20] • FixMatch [Sohn+, NeurIPS 20] 7
  5. Consistency regularization • Stochastic augmentation • Random crop • Horizontal

    flip • Gaussian Noise 8 Random Crop Random Crop Consistency: Dog
  6. 10 0.1 0.2 0.6 0.1 random augmentation1 Π-model [Rasmus+, NeurIPS

    15] 0.2 0.1 0.7 0.0 random augmentation2 CNN w/ dropout CNN w/ dropout Same network 𝜃𝜃 𝜃𝜃
  7. 11 random augmentation1 Π-model [Rasmus+, NeurIPS 15] 0.2 0.1 0.7

    0.0 random augmentation2 CNN w/ dropout CNN w/ dropout Mean squared error (MSE) loss back prop unlabeled 0.1 0.2 0.6 0.1
  8. 12 random augmentation1 Π-model [Rasmus+, NeurIPS 15] 0.2 0.1 0.7

    0.0 random augmentation2 CNN w/ dropout CNN w/ dropout back prop 0 0 1 0 Cross-entropy loss back prop Groundtruth Label labeled Mean squared error (MSE) loss 0.1 0.2 0.6 0.1
  9. 13 random augmentation Temporal Ensembling [Laine+, ICLR 17] CNN w/

    dropout MSE loss back prop Ensemble output unlabeled 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  10. 14 random augmentation Temporal Ensembling [Laine+, ICLR 17] CNN w/

    dropout Ensemble output Update: 𝑍𝑍𝑖𝑖+1 = 𝑍𝑍𝑖𝑖 × 𝛼𝛼 + 𝑧𝑧𝑖𝑖 × 1 − 𝛼𝛼 𝑧𝑧𝑖𝑖 Temporal Ensembling unlabeled 𝑍𝑍𝑖𝑖 0.1 0.2 0.6 0.1
  11. 15 random augmentation Temporal Ensembling [Laine+, ICLR 17] CNN w/

    dropout MSE loss back prop Ensemble output labeled back prop 0 0 1 0 Groundtruth Label Cross-entropy loss 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  12. 16 CNN random augmentation1 random augmentation2 Teacher CNN Mean Teacher

    [Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  13. 17 CNN random augmentation1 random augmentation2 Teacher CNN More accurate

    result Mean Teacher [Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  14. 18 CNN random augmentation1 random augmentation2 More accurate result MSE

    loss back prop Mean Teacher [Tarvainen+, NeurIPS 17] Teacher CNN 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  15. 19 CNN random augmentation1 random augmentation2 Teacher CNN More accurate

    result MSE loss back prop 0 1 0 0 back prop Groundtruth Label Train: Mean Teacher [Tarvainen+, NeurIPS 17] Cross-entropy loss labeled 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  16. 20 CNN random augmentation1 random augmentation2 More accurate result MSE

    loss back prop How to generate teacher CNN? Mean Teacher [Tarvainen+, NeurIPS 17] Teacher CNN 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  17. 21 CNN random augmentation1 random augmentation2 Teacher CNN More accurate

    result same structure Mean Teacher [Tarvainen+, NeurIPS 17] 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0
  18. 22 CNN random augmentation1 random augmentation2 Teacher CNN More accurate

    result exponential moving average of past iterations 𝜃𝜃 𝜃𝜃′ Mean Teacher [Tarvainen+, NeurIPS 17] α = 0.999 𝜃𝜃𝑡𝑡 ′ = 𝜃𝜃𝑡𝑡−1 ′ × α + 𝜃𝜃𝑡𝑡−1 × 1 − α 0.1 0.2 0.6 0.1 0.2 0.1 0.7 0.0 Parameters of CNN
  19. Experiment • Network: • 13-layer ConvNet • Dataset: 23 CIFAR-10

    SVHN CIFAR-10 #Labeled #Unlabeled #Test 1,000 49,000 10,000 2,000 48,000 4,000 46,000 50,000 0 SVHN #Labeled #Unlabeled #Test 250 73,007 26,032 500 72,757 1,000 72,257 73,257 0
  20. Result (CIFAR-10) 24 1,000 labeled 2,000 labeled 4,000 labeled 50,000

    labeled Π-model N/A N/A 12.36 ± 0.31 5.56 ± 0.10 Temporal Ensembling N/A N/A 12.16 ± 0.31 5.60 ± 0.10 Supervised-only 46.43 ± 1.21 33.94 ± 0.73 20.66 ± 0.57 5.82 ± 0.15 Π-model (Replicated) 27.36 ± 1.20 18.02 ± 0.60 13.20 ± 0.27 6.06 ± 0.11 Mean Teacher 21.55 ± 1.48 15.73 ± 0.31 12.31 ± 0.28 5.94 ± 0.15 • Error rate (%)
  21. Result (SVHN) 25 250 labeled 500 labeled 1,000 labeled 73,257

    labeled Π-model N/A 6.65 ± 0.53 4.82 ± 0.17 2.54 ± 0.04 Temporal Ensembling N/A 5.12 ± 0.13 4.42 ± 0.16 2.74 ± 0.06 Supervised-only 27.77 ± 3.18 16.88 ± 1.30 12.32 ± 0.95 2.75 ± 0.10 Π-model (Replicated) 9.69 ± 0.92 6.83 ± 0.66 4.95 ± 0.26 2.50 ± 0.07 Mean Teacher 4.35 ± 0.50 4.18 ± 0.27 3.95 ± 0.19 2.50 ± 0.05 • Error rate (%)
  22. Methods for Semi-supervised Learning • Consistency regularization • Π-model [Rasmus+,

    NeurIPS 15] • Temporal Ensembling [Laine+, ICLR 17] • Mean Teacher [Tarvainen+, NeurIPS 17] • State-of-the-art methods • MixMatch [Berthelot+, NeurIPS 19] • ReMixMatch [Berthelot+, ICLR 20] • FixMatch [Sohn+, NeurIPS 20] 26
  23. State-of-the-art methods • Stochastic augmentation • Random crop, Horizontal flip,

    Gaussian Noise • Auto augmentation [Cubuk+, CVPR 19][Lim+, NeurIPS 19][Cubuk+, CVPRW 20] 27 AutoAugment [Cubuk+, CVPR 19] Weak Augmentation Strong Augmentation
  24. State-of-the-art methods • Stochastic augmentation • Random crop, Horizontal flip,

    Gaussian Noise • Auto augmentation [Cubuk+, CVPR 19][Lim+, NeurIPS 19][Cubuk+, CVPRW 20] • MixUp [Zhang+, ICLR 18] 28 Label: cat 0.7 * dog + 0.3 * cat Label: dog
  25. MixMatch [Berthelot+, NeurIPS 19] 30 Mixup Unlabeled Sample Pseudo label:

    dog Labeled Sample cat Label: 0.7 * dog + 0.3 * cat CNN • Training [0.0, 1.0] [1.0, 0.0] [0.7, 0.3] Mixed Image Mixed Label
  26. ReMixMatch [Berthelot+, ICLR 20] 31 CNN weak augmentation pseudo label

    strong augmentation • Label guessing Unlabeled Sample
  27. ReMixMatch [Berthelot+, ICLR 20] 32 Mixup Unlabeled Sample Pseudo label:

    dog Labeled Sample cat Label: 0.7 * dog + 0.3 * cat CNN • Training
  28. FixMatch [Sohn+, NeurIPS 20] 34 CNN weak augmentation strong augmentation

    • Training Unlabeled Sample prediction pseudo label one-hot CNN CNN CNN prediction prediction prediction back prop Cross-entropy loss same network same network
  29. Experiment • Problems • Different hyperparameters • Different neural networks

    • Different deep learning frameworks -> How to compare the methods fairly? 35
  30. Experiment • Realistic Evaluation of Semi-Supervised Learning [Oliver+, NeurIPS 18]

    • Same model: Wide ResNet-28-2 [Zagoruyko+, BMVC 16] • Same training protocol (optimizer, learning rate schedule, data preprocessing) • Same code base • Same hyperparameter optimization • Dataset: 36 CIFAR-10 #Labeled #Unlabeled #Test 40 49,960 10,000 250 49,750 1,000 49,000 CIFAR-100 #Labeled #Unlabeled #Test 400 49,600 10,000 2,500 47,500 10,000 40,000 SVHN #Labeled #Unlabeled #Test 40 73,217 26,032 250 73,007 1,000 72,257
  31. Result (CIFAR-10) 37 40 labeled 250 labeled 4,000 labeled Π-model

    N/A 54.26 ± 3.97 14.01 ± 0.38 Mean Teacher N/A 32.32 ± 2.30 9.19 ± 0.19 MixMatch 47.54 ± 11.50 11.05 ± 0.86 6.42 ± 0.10 ReMixMatch 19.10 ± 9.64 5.44 ± 0.05 4.72 ± 0.13 FixMatch (RA) 13.81 ± 3.37 5.07 ± 0.65 4.26 ± 0.05 FixMatch (CTA) 11.39 ± 3.35 5.07 ± 0.33 4.31 ± 0.15 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20]
  32. Result (CIFAR-100) 38 400 labeled 2,500 labeled 10,000 labeled Π-model

    N/A 57.25 ± 0.48 37.88 ± 0.11 Mean Teacher N/A 53.91 ± 0.57 35.83 ± 0.24 MixMatch 67.61 ± 1.32 39.94 ± 0.37 28.31 ± 0.33 ReMixMatch 44.28 ± 2.06 27.43 ± 0.31 23.03 ± 0.56 FixMatch (RA) 48.85 ± 1.75 28.29 ± 0.11 22.60 ± 0.12 FixMatch (CTA) 49.95 ± 3.01 28.64 ± 0.24 23.18 ± 0.11 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20] ReMixMatch
  33. Result (SVHN) 39 40 labeled 250 labeled 1,000 labeled Π-model

    N/A 18.96 ± 1.92 7.54 ± 0.36 Mean Teacher N/A 3.57 ± 0.11 3.42 ± 0.07 MixMatch 42.55 ± 14.53 3.98 ± 0.23 3.50 ± 0.28 ReMixMatch 3.34 ± 0.20 2.92 ± 0.48 2.65 ± 0.08 FixMatch (RA) 3.96 ± 2.17 2.48 ± 0.38 2.28 ± 0.11 FixMatch (CTA) 7.65 ± 7.65 2.64 ± 0.64 2.36 ± 0.19 • Error rate (%) RA: RandAugment [Cubuk+, CVPRW 20] CTA: Control Theory Augment [Berthelot+, ICLR 20]
  34. Recent methods • Dash [Xu+, ICML 21] • CoMatch [Li+,

    ICCV 21] • FlexMatch [Zhang+, NeurIPS 21] • SimMatch [Zheng+, CVPR 22] 40
  35. Open-set Semi-supervised Learning 42 # Labeled Data # Unlabeled Data

    Dog Cat Training Data Test Data CNN << Dog + Cat
  36. Open-set Semi-supervised Learning 43 # Labeled Data # Unlabeled Data

    Dog Cat Training Data Test Data CNN << Dog + Cat + Outlier damage
  37. Open-set Semi-supervised Learning 44 Dog Cat Training Data Test Data

    CNN Detect outliers in unlabeled data when labeled data is limited damage # Labeled Data # Unlabeled Data <<
  38. Methods for Open-set Semi-supervised Learning • Multi-Task Curriculum Framework [Yu+,

    ECCV 20] • OpenMatch [Saito+, NeurIPS 21] • Trash to Treasure [Huang+, ICCV 21] • Safe-Student [He+, CVPR 22] 45
  39. Multi-Task Curriculum Framework [Yu+, ECCV 20] 47 • Framework Detect

    outliers Semi-supervised learning without outliers
  40. Recent topics of Semi-supervised Learning • The performance in benchmarks

    ≠ The performance in real-world • vs self-supervised learning • When the dataset is large enough, fine-tuning self-supervised models is enough? 49
  41. Recent topics of Semi-supervised Learning • The performance in benchmarks

    ≠ The performance in real-world • vs self-supervised learning • When the dataset is large enough, fine-tuning self-supervised models is enough? • vs zero-shot prediction (e.g. CLIP) • Is CLIP enough for most datasets? 50
  42. Conclusion • Semi-supervised learning is proposed to leverage unlabeled data

    to train a model when only limited labeled data is available. • Recent methods can train a high performance model even when only 40 labeled samples are labeled. • Self-supervised learning and pretrained vision-and-language models are challenging semi-supervised learning. 51