Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SimCLR: A Simple Framework for Contrastive Lear...

Scatter Lab Inc.
September 11, 2020

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Scatter Lab Inc.

September 11, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. A Simple Framework for Contrastive Learning of Visual Representation Overview

    • “A Simple Framework for Contrastive Learning of Visual Representation” • Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton • Google Research & Brain • ICML 2020 • Contrastive Learningਸ ਤೠ Frameworkਸ ઁউೞҊ, п ਃࣗ੄ ੄޷৬ ӝৈܳ ࠙ࢳೣ • ImageNet (Linear evaluation)ীࢲ State-of-the-art ࢿמਸ ࠁ੐
  2. Self-Supervised Learning (SSL) 1. Introduction • ਃ્ ઁੌ ೥ೠ ఃਕ٘

    ઺ ೞա (ౠ൤ Computer Visionীࢲ) • ҙ۲ ఃਕ٘: Unsupervised Learning, Representation(Embedding) Learning, Contrastive Learning, Augmentation • Pretext Taskী ؀೧ࢲ Supervised Learning୊ۢ Objective Functionਵ۽ ೟णೣ • Pretext Task: Unlabeled Data۽ࠗఠ ٜ݅যմ Inputҗ Labelਸ ੉ਊೠ Predictive Task
  3. Pretext Task 1. Introduction (a) Relative Path Prediction
 (Doersch et

    al., 2015) (b) Jigsaw Puzzle (Noroozi et al., 2016) (d) Rotation Prediction
 (Gidaris et al., 2018) (c) Colorization (Larsson et al., 2017)
  4. Pretext Task 1. Introduction (a) Masked Language Modeling (b) Next

    Sentence Prediction (c) Language Modeling
 (auto-regressive)
  5. Supervised / Unsupervised / Self-supervised 1. Introduction • Supervised Learning:

    • ࢎۈ੉ ҙৈೞৈ Target Taskী ٮۄ Inputী ؀ೠ Labelਸ ٜ݅Ҋ ੉ܳ ೟णೣ (e.g. text classification) • Unsupervised Learning: • ؘ੉ఠ ࢚ী Label੉ ઓ੤ೞ૑ ঋҊ ೟णী ࢎਊೞ૑ب ঋ਺ (e.g. Clustering, Auto-encoder, GAN) • Self-supervised Learning: • Unlabeled ؘ੉ఠ۽ࠗఠ Inputҗ Labelਸ ੗زਵ۽ ٜ݅যࢲ, Supervised Learning୊ۢ ೟णೣ
  6. Contrastive Learning 1. Introduction • Example Pairо ਬࢎೠ૑ ইצ૑ ݏ୶ח

    ޙઁ • Example Pair੄ Latent Space ࢚੄ Ѣܻܳ
 ਬࢎೞݶ оӰѱ, ׮ܰݶ ݣѱ Representationਸ ೟णदఃח ߑध • Metric Learning੄ ੌઙਵ۽ࢲ, 
 Example੄ ౠ૚җ ҙ҅ܳ ੜ ಴അೞب۾ ೟णदఃח ѱ ೨ब
  7. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?
  8. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?
  9. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо? 2. Ability to know what an image represents • যڌѱ Representationਸ ੜ ٜ݅ Ѫੋо?
  10. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо? 2. Ability to know what an image represents • যڌѱ Representationਸ ੜ ٜ݅ Ѫੋо? 3. Ability to quantify if two images are similar • যڌѱ ਬࢎೠ ੿بܳ ബҗ੸ਵ۽ ஏ੿ೡ Ѫੋо?
  11. 2. SimCLR SimCLR - 1) Data Augmentation • ׮਺੄ 3о૑

    Augmentation ӝߨਸ ےؒೞѱ ੸ਊ • Random Crop (with flip and resize) • Color Distortion • Gaussian Blur
  12. 2. SimCLR SimCLR - 2) Encoder • ח Representation ܳ

    ݅٘ח Encoder Network • Encoderח ResNetਸ ࢎਊೣ (׮ܲ Ѫب ࢎਊ оמ) • f( ⋅ ) h h = f(˜ x) = ResNet(˜ x), where h ∈ ℝd
  13. 2. SimCLR SimCLR - 3) Projection • ח Representation ܳ

    Latent Space۽ ࢎ࢚ೞח Projection Network • 2-layer Nonlinear MLP (fully connected) ܳ ࢎਊೣ • g( ⋅ ) h z = g(h) = W(2)σ(W(1)h)
  14. 2. SimCLR SimCLR - 4) Loss • In-batch Negativesܳ ੉ਊ೧ࢲ

    Cross Entropy ೟ण • ੿ഛ൤ח Normalized temperature-scaled Cross Entropy (NT-Xent) • Batch ղ੄ ѐ੄ imageܳ п 2о૑ ߑधਵ۽ Augmentation ೞݶ ୨ ѐ੄ imageܳ ঳ਸ ࣻ ੓਺ (ਗࠄ image ࢎਊ উೣ) • زੌೠ ਗࠄ imageܳ ыח image ह ח positive,
 ୹୊о ׮ܲ աݠ૑ imageٜҗ੄ ह ח negative • ૊, ੹୓ ѐ੄ ह ઺ীࢲ positiveܳ ଺ח धਵ۽ ೟ण N 2N (zi , zj ) (zi , zk ) 2N − 1
  15. 2. SimCLR Training Details 1.Batch Size • 256ࠗఠ 8192ө૑ ׮নೞѱ

    प೷ • Batch sizeо ੌ ٸ, ੉ Negativesפө ୭؀ 16382ѐ 2.LARS Optimizer • Batch sizeо ழ૑ݶ Learning rateب ழઉঠ ೞחؘ, ੉۞ݶ ೟ण੉ ࠛউ੿೧૗ (Goyal et al., 2017) • Large batch sizeীࢲ উ੿੸ੋ ೟णਸ ਤ೧ LARS Optimizerܳ ࢎਊ N 2(N − 1)
  16. 2. SimCLR Training Details 3.Global Batch Normalization • ࠙࢑ ೟णਸ

    ೡ ٸ, BN੄ mean, varianceח п device ߹۽ ҅࢑ؽ (aggregated locally per device) • Positive pairח ೦࢚ э਷ deviceীࢲ ҅࢑ೞחؘ, ੉۽ ੋ೧ જ਷ representationਸ ݅٘ח ߑೱ੉ ইצ local ੿ࠁܳ ӝ߈ਵ۽ ৘ஏ ࢿמ݅ਸ ֫੉ח ߑೱਵ۽ ೟णೡ ࣻ ੓਺ 
 → Global mean, varianceܳ ࢎਊ
  17. 3. Experiments Dataset & Evaluation Protocols • Dataset: ImageNet ILSVRC-2012

    (Russakovsky et al., 2015) • Evaluation Protocols 1.Linear Evaluation • Linear Classifier trained on learned features (frozen encoder) 2.Semi-supervised Learning • Fine-tune the model on few labels 3.Transfer Learning • Transfer Learning by fine-tuning on other datasets
  18. 3. Experiments 2. Semi-supervised Learning • ੹୓ Labeled ILSVRC-2012 ೟ण

    ؘ੉ఠীࢲ 1%, 10%݅ ୶୹ೞৈ ೟ण ߂ ಣо (class-balanced way) • Class ׼ 1% ӝળ 12.8੢ ೟ण • ੉੹ state-of-the-art ؀࠺ 10% ࢿמ ೱ࢚ • ଵҊ۽ 100% ೟णदఃݶ training from scratch ࠁ׮ ࢿמ੉ ؊ য়ܴ (ResNet (4x) 78.4% / 94.2% → 80.4% / 95.4%)
  19. 4. Discussion Large Models • Unsupervised contrastive learning benefits (more)

    from bigger models • ݽ؛ ௼ӝо ழ૕ ࣻ۾ ࢿמ੄ ੉ٙ੉ ੓਺ • Supervised ࠁ׮ Unsupervised Learning੉ ݽ؛ ௼ӝ ૐоী ٮۄ ೱ࢚ ಩੉ ؊ ఀ • Unsupervised Learning੉ ࢚؀੸ਵ۽ ؊ ௾ ੉ٙ
  20. 4. Discussion Nonlinear Projection • Projection Head ח যڃ ѱ

    ઁੌ જਸө? • ܳ ׮਺੄ 3о૑ ߑߨਵ۽ प೷ • Identity mapping • Linear projection • Nonlinear projection with one additional hidden layer • Nonlinear projection੉ Linear projectionࠁ׮ח 3%, No projection ࠁ׮ 10% ੉࢚ જ਺ (see Figure 8) g( ⋅ ) g( ⋅ )
  21. 4. Discussion Nonlinear Projection • Downstream taskী ࢎਊೡ Representationਵ۽ॄ,
 Projection

    ੉੹੄ , ੉റ੄ , ޖ঺੉ ؊ જਸө? • ࠁ׮ о ഻ঁ ؊ જ਺ (see Table 3 & Figure B.4) • Why? • is trained to be invariant to data transformation • Contrastive loss੄ ౠࢿ ࢚ ੉޷૑ ߸ജ(࢝, ഥ੹ ١)җ ޖҙೞѱ э਷ ੉޷૑ۄҊ ೟ण • ೞ૑݅ ੉۠ ੿ࠁח Downstream taskীࢲח ਬਊೡ ࣻ ੓਺ h z = g(h) z = g(h) h z = g(h)
  22. 4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • normalization (i.e.

    cosine similarity) along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives • Cross Entropyী ࠺೧ ׮ܲ Objectiveٜ਷ Negatives ࢎ੉ী ࢚؀੸ੋ য۰਑ਸ ಴അೡ ࣻ হ׮ח ױ੼੉ ੓਺ l2
  23. 4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • Similarity ೣࣻ

    ߂ Scalingਸ ਤೠ Temperature termਸ যڌѱ ࢸ੿ೞח Ѫ੉ જਸө? • Cosine similarity vs. Dot product • Cosine similarity with о о੢ જ਺ • Contractive Accuracyח Dot productо о੢ જওਵա Linear Evaluation਷ য়൤۰ Cosine Similarityо જ਺ τ = 0.1
  24. 4. Discussion Batch Size and Training Time • Epochਸ ੘ѱ

    оઉт ٸח Batch Sizeо ௿ࣻ۾ ࢿמ੉ әѺೞѱ ೱ࢚ؽ • Supervised Learningҗ ׳ܻ Contrastive Learning਷ Batch Sizeо ௿ࣻ۾ ؊ ݆਷ Negative Exampleਸ ઁҕ ೞӝ ٸޙী ؊ ബҗ੸ਵ۽ ࣻ۴ೣ
  25. Conclusion 5. Conclusion • SimCLRۄח рױೞݶࢲب ബҗ੸ੋ Framework for Contrastive

    Learningਸ ઁউೣ • ӝઓ੄ state-of-the-artҗ ࠺Үೞৈ ௼ѱ ࢿמਸ ೱ࢚दఇ • Representation Learningী ੓যࢲ ઺ਃೠ ਃٜࣗਸ प೷ਸ ా೧ࢲ ࠙ࢳೞৈ ݆਷ ੋࢎ੉౟ܳ ઁҕೣ • ઱ഘ ࢤп • RRM, SSM ೟णী ੓যࢲ ੹߈੸ੋ ҳઑ ߂ Nonlinear projection, NT-Xent ١ਸ ੸ਊ೧ࠅ ࣻ ੓ਸ ٠ • Linear Evaluation э੉ Representationী ؀ೠ ಣо ߑߨ੉ ੓ਵݶ જਸ ٠ (see Table 5; ֤ޙ п?) • (ডр ୶࢚੸੐) Similarity Modelਸ Self-supervisionਵ۽ ٜ݅ ߑߨ੉ ઑӘ ࢤп೧ࠁݶ ৷૑ ੓ਸ Ѫ э਺
  26. хࢎ೤פ׮✌ ୶о ૕ޙ ژח ҾӘೠ ੼੉ ੓׮ݶ ঱ઁٚ ইې োۅ୊۽

    োۅ ઱ࣁਃ! ੉઱ഘ (ML Research Scientist @ Pingpong) Email. [email protected] Facebook. @roomylee Linked in. @roomylee
  27. Reference A Simple Framework for Contrastive Learning of Visual Representation

    (Chen et al., 2020) • Papers • (SimCLR) A Simple Framework for Contrastive Learning of Visual Representations: https://arxiv.org/abs/2002.05709 • Official Repository: https://github.com/google-research/simclr • (SimCLR v2) Big Self-Supervised Models are Strong Semi-Supervised Learners: https://arxiv.org/abs/2006.10029 • (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning: https://arxiv.org/abs/1911.05722 • (MoCo v2) Improved Baselines with Momentum Contrastive Learning: https://arxiv.org/abs/2003.04297 • Other Materials • SimCLR Slide by Google Brain: https://docs.google.com/presentation/d/1ccddJFD_j3p3h0TCqSV9ajSi2y1yOfh0-lJoK29ircs/edit? usp=sharing • The Illustrated SimCLR Framework: https://amitness.com/2020/03/illustrated-simclr/ • PR-231 Video: https://www.youtube.com/watch?v=FWhM3juUM6s