SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

A42dd3541cd40296dcd8a5e6b4a01bef?s=47 Scatter Lab Inc.
September 11, 2020

SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

A42dd3541cd40296dcd8a5e6b4a01bef?s=128

Scatter Lab Inc.

September 11, 2020
Tweet

Transcript

  1. SimCLR: A Simple Framework for Contrastive Learning of Visual Representation

    ੉઱ഘ (ML Research Scientist @ Pingpong)
  2. A Simple Framework for Contrastive Learning of Visual Representation Overview

    • “A Simple Framework for Contrastive Learning of Visual Representation” • Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton • Google Research & Brain • ICML 2020 • Contrastive Learningਸ ਤೠ Frameworkਸ ઁউೞҊ, п ਃࣗ੄ ੄޷৬ ӝৈܳ ࠙ࢳೣ • ImageNet (Linear evaluation)ীࢲ State-of-the-art ࢿמਸ ࠁ੐
  3. 1. Introduction A Simple Framework for Contrastive Learning of Visual

    Representation (Chen et al., 2020)
  4. Self-Supervised Learning (SSL) 1. Introduction • ਃ્ ઁੌ ೥ೠ ఃਕ٘

    ઺ ೞա (ౠ൤ Computer Visionীࢲ) • ҙ۲ ఃਕ٘: Unsupervised Learning, Representation(Embedding) Learning, Contrastive Learning, Augmentation • Pretext Taskী ؀೧ࢲ Supervised Learning୊ۢ Objective Functionਵ۽ ೟णೣ • Pretext Task: Unlabeled Data۽ࠗఠ ٜ݅যմ Inputҗ Labelਸ ੉ਊೠ Predictive Task
  5. Pretext Task 1. Introduction (a) Relative Path Prediction
 (Doersch et

    al., 2015) (b) Jigsaw Puzzle (Noroozi et al., 2016) (d) Rotation Prediction
 (Gidaris et al., 2018) (c) Colorization (Larsson et al., 2017)
  6. Pretext Task 1. Introduction (a) Masked Language Modeling (b) Next

    Sentence Prediction (c) Language Modeling
 (auto-regressive)
  7. Supervised / Unsupervised / Self-supervised 1. Introduction • Supervised Learning:

    • ࢎۈ੉ ҙৈೞৈ Target Taskী ٮۄ Inputী ؀ೠ Labelਸ ٜ݅Ҋ ੉ܳ ೟णೣ (e.g. text classification) • Unsupervised Learning: • ؘ੉ఠ ࢚ী Label੉ ઓ੤ೞ૑ ঋҊ ೟णী ࢎਊೞ૑ب ঋ਺ (e.g. Clustering, Auto-encoder, GAN) • Self-supervised Learning: • Unlabeled ؘ੉ఠ۽ࠗఠ Inputҗ Labelਸ ੗زਵ۽ ٜ݅যࢲ, Supervised Learning୊ۢ ೟णೣ
  8. Contrastive Learning 1. Introduction • Example Pairо ਬࢎೠ૑ ইצ૑ ݏ୶ח

    ޙઁ • Example Pair੄ Latent Space ࢚੄ Ѣܻܳ
 ਬࢎೞݶ оӰѱ, ׮ܰݶ ݣѱ Representationਸ ೟णदఃח ߑध • Metric Learning੄ ੌઙਵ۽ࢲ, 
 Example੄ ౠ૚җ ҙ҅ܳ ੜ ಴അೞب۾ ೟णदఃח ѱ ೨ब
  9. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?
  10. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?
  11. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо? 2. Ability to know what an image represents • যڌѱ Representationਸ ੜ ٜ݅ Ѫੋо?
  12. Key Points of Contrastive Learning 1. Introduction 1.Example of similar

    and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо? 2. Ability to know what an image represents • যڌѱ Representationਸ ੜ ٜ݅ Ѫੋо? 3. Ability to quantify if two images are similar • যڌѱ ਬࢎೠ ੿بܳ ബҗ੸ਵ۽ ஏ੿ೡ Ѫੋо?
  13. 2. SimCLR A Simple Framework for Contrastive Learning of Visual

    Representation (Chen et al., 2020)
  14. 2. SimCLR SimCLR

  15. 2. SimCLR SimCLR - 1) Data Augmentation • ׮਺੄ 3о૑

    Augmentation ӝߨਸ ےؒೞѱ ੸ਊ • Random Crop (with flip and resize) • Color Distortion • Gaussian Blur
  16. 2. SimCLR SimCLR - 2) Encoder • ח Representation ܳ

    ݅٘ח Encoder Network • Encoderח ResNetਸ ࢎਊೣ (׮ܲ Ѫب ࢎਊ оמ) • f( ⋅ ) h h = f(˜ x) = ResNet(˜ x), where h ∈ ℝd
  17. 2. SimCLR SimCLR - 3) Projection • ח Representation ܳ

    Latent Space۽ ࢎ࢚ೞח Projection Network • 2-layer Nonlinear MLP (fully connected) ܳ ࢎਊೣ • g( ⋅ ) h z = g(h) = W(2)σ(W(1)h)
  18. 2. SimCLR SimCLR - 4) Loss • In-batch Negativesܳ ੉ਊ೧ࢲ

    Cross Entropy ೟ण • ੿ഛ൤ח Normalized temperature-scaled Cross Entropy (NT-Xent) • Batch ղ੄ ѐ੄ imageܳ п 2о૑ ߑधਵ۽ Augmentation ೞݶ ୨ ѐ੄ imageܳ ঳ਸ ࣻ ੓਺ (ਗࠄ image ࢎਊ উೣ) • زੌೠ ਗࠄ imageܳ ыח image ह ח positive,
 ୹୊о ׮ܲ աݠ૑ imageٜҗ੄ ह ח negative • ૊, ੹୓ ѐ੄ ह ઺ীࢲ positiveܳ ଺ח धਵ۽ ೟ण N 2N (zi , zj ) (zi , zk ) 2N − 1
  19. 2. SimCLR SimCLR - Summary attract attract repel

  20. 2. SimCLR SimCLR - Algorithm

  21. 2. SimCLR Augmentation SimCLR - Algorithm

  22. 2. SimCLR Encoding Representation SimCLR - Algorithm

  23. 2. SimCLR Nonlinear Projection SimCLR - Algorithm

  24. 2. SimCLR Similarity & Loss SimCLR - Algorithm

  25. 2. SimCLR Training Details 1.Batch Size • 256ࠗఠ 8192ө૑ ׮নೞѱ

    प೷ • Batch sizeо ੌ ٸ, ੉ Negativesפө ୭؀ 16382ѐ 2.LARS Optimizer • Batch sizeо ழ૑ݶ Learning rateب ழઉঠ ೞחؘ, ੉۞ݶ ೟ण੉ ࠛউ੿೧૗ (Goyal et al., 2017) • Large batch sizeীࢲ উ੿੸ੋ ೟णਸ ਤ೧ LARS Optimizerܳ ࢎਊ N 2(N − 1)
  26. 2. SimCLR Training Details 3.Global Batch Normalization • ࠙࢑ ೟णਸ

    ೡ ٸ, BN੄ mean, varianceח п device ߹۽ ҅࢑ؽ (aggregated locally per device) • Positive pairח ೦࢚ э਷ deviceীࢲ ҅࢑ೞחؘ, ੉۽ ੋ೧ જ਷ representationਸ ݅٘ח ߑೱ੉ ইצ local ੿ࠁܳ ӝ߈ਵ۽ ৘ஏ ࢿמ݅ਸ ֫੉ח ߑೱਵ۽ ೟णೡ ࣻ ੓਺ 
 → Global mean, varianceܳ ࢎਊ
  27. 2. SimCLR SimCLR for Downstream tasks

  28. 3. Experiments A Simple Framework for Contrastive Learning of Visual

    Representation (Chen et al., 2020)
  29. 3. Experiments Dataset & Evaluation Protocols • Dataset: ImageNet ILSVRC-2012

    (Russakovsky et al., 2015) • Evaluation Protocols 1.Linear Evaluation • Linear Classifier trained on learned features (frozen encoder) 2.Semi-supervised Learning • Fine-tune the model on few labels 3.Transfer Learning • Transfer Learning by fine-tuning on other datasets
  30. 3. Experiments 1. Linear Evaluation

  31. 3. Experiments 2. Semi-supervised Learning • ੹୓ Labeled ILSVRC-2012 ೟ण

    ؘ੉ఠীࢲ 1%, 10%݅ ୶୹ೞৈ ೟ण ߂ ಣо (class-balanced way) • Class ׼ 1% ӝળ 12.8੢ ೟ण • ੉੹ state-of-the-art ؀࠺ 10% ࢿמ ೱ࢚ • ଵҊ۽ 100% ೟णदఃݶ training from scratch ࠁ׮ ࢿמ੉ ؊ য়ܴ (ResNet (4x) 78.4% / 94.2% → 80.4% / 95.4%)
  32. 3. Experiments 3. Transfer Learning

  33. 4. Discussion A Simple Framework for Contrastive Learning of Visual

    Representation (Chen et al., 2020)
  34. 4. Discussion Large Models • Unsupervised contrastive learning benefits (more)

    from bigger models • ݽ؛ ௼ӝо ழ૕ ࣻ۾ ࢿמ੄ ੉ٙ੉ ੓਺ • Supervised ࠁ׮ Unsupervised Learning੉ ݽ؛ ௼ӝ ૐоী ٮۄ ೱ࢚ ಩੉ ؊ ఀ • Unsupervised Learning੉ ࢚؀੸ਵ۽ ؊ ௾ ੉ٙ
  35. 4. Discussion Nonlinear Projection • Projection Head ח যڃ ѱ

    ઁੌ જਸө? • ܳ ׮਺੄ 3о૑ ߑߨਵ۽ प೷ • Identity mapping • Linear projection • Nonlinear projection with one additional hidden layer • Nonlinear projection੉ Linear projectionࠁ׮ח 3%, No projection ࠁ׮ 10% ੉࢚ જ਺ (see Figure 8) g( ⋅ ) g( ⋅ )
  36. 4. Discussion Nonlinear Projection • Downstream taskী ࢎਊೡ Representationਵ۽ॄ,
 Projection

    ੉੹੄ , ੉റ੄ , ޖ঺੉ ؊ જਸө? • ࠁ׮ о ഻ঁ ؊ જ਺ (see Table 3 & Figure B.4) • Why? • is trained to be invariant to data transformation • Contrastive loss੄ ౠࢿ ࢚ ੉޷૑ ߸ജ(࢝, ഥ੹ ١)җ ޖҙೞѱ э਷ ੉޷૑ۄҊ ೟ण • ೞ૑݅ ੉۠ ੿ࠁח Downstream taskীࢲח ਬਊೡ ࣻ ੓਺ h z = g(h) z = g(h) h z = g(h)
  37. 4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • normalization (i.e.

    cosine similarity) along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives • Cross Entropyী ࠺೧ ׮ܲ Objectiveٜ਷ Negatives ࢎ੉ী ࢚؀੸ੋ য۰਑ਸ ಴അೡ ࣻ হ׮ח ױ੼੉ ੓਺ l2
  38. 4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • Similarity ೣࣻ

    ߂ Scalingਸ ਤೠ Temperature termਸ যڌѱ ࢸ੿ೞח Ѫ੉ જਸө? • Cosine similarity vs. Dot product • Cosine similarity with о о੢ જ਺ • Contractive Accuracyח Dot productо о੢ જওਵա Linear Evaluation਷ য়൤۰ Cosine Similarityо જ਺ τ = 0.1
  39. 4. Discussion Batch Size and Training Time • Epochਸ ੘ѱ

    оઉт ٸח Batch Sizeо ௿ࣻ۾ ࢿמ੉ әѺೞѱ ೱ࢚ؽ • Supervised Learningҗ ׳ܻ Contrastive Learning਷ Batch Sizeо ௿ࣻ۾ ؊ ݆਷ Negative Exampleਸ ઁҕ ೞӝ ٸޙী ؊ ബҗ੸ਵ۽ ࣻ۴ೣ
  40. 5. Conclusion A Simple Framework for Contrastive Learning of Visual

    Representation (Chen et al., 2020)
  41. Conclusion 5. Conclusion • SimCLRۄח рױೞݶࢲب ബҗ੸ੋ Framework for Contrastive

    Learningਸ ઁউೣ • ӝઓ੄ state-of-the-artҗ ࠺Үೞৈ ௼ѱ ࢿמਸ ೱ࢚दఇ • Representation Learningী ੓যࢲ ઺ਃೠ ਃٜࣗਸ प೷ਸ ా೧ࢲ ࠙ࢳೞৈ ݆਷ ੋࢎ੉౟ܳ ઁҕೣ • ઱ഘ ࢤп • RRM, SSM ೟णী ੓যࢲ ੹߈੸ੋ ҳઑ ߂ Nonlinear projection, NT-Xent ١ਸ ੸ਊ೧ࠅ ࣻ ੓ਸ ٠ • Linear Evaluation э੉ Representationী ؀ೠ ಣо ߑߨ੉ ੓ਵݶ જਸ ٠ (see Table 5; ֤ޙ п?) • (ডр ୶࢚੸੐) Similarity Modelਸ Self-supervisionਵ۽ ٜ݅ ߑߨ੉ ઑӘ ࢤп೧ࠁݶ ৷૑ ੓ਸ Ѫ э਺
  42. хࢎ೤פ׮✌ ୶о ૕ޙ ژח ҾӘೠ ੼੉ ੓׮ݶ ঱ઁٚ ইې োۅ୊۽

    োۅ ઱ࣁਃ! ੉઱ഘ (ML Research Scientist @ Pingpong) Email. joohong@scatterlab.co.kr Facebook. @roomylee Linked in. @roomylee
  43. Reference A Simple Framework for Contrastive Learning of Visual Representation

    (Chen et al., 2020) • Papers • (SimCLR) A Simple Framework for Contrastive Learning of Visual Representations: https://arxiv.org/abs/2002.05709 • Official Repository: https://github.com/google-research/simclr • (SimCLR v2) Big Self-Supervised Models are Strong Semi-Supervised Learners: https://arxiv.org/abs/2006.10029 • (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning: https://arxiv.org/abs/1911.05722 • (MoCo v2) Improved Baselines with Momentum Contrastive Learning: https://arxiv.org/abs/2003.04297 • Other Materials • SimCLR Slide by Google Brain: https://docs.google.com/presentation/d/1ccddJFD_j3p3h0TCqSV9ajSi2y1yOfh0-lJoK29ircs/edit? usp=sharing • The Illustrated SimCLR Framework: https://amitness.com/2020/03/illustrated-simclr/ • PR-231 Video: https://www.youtube.com/watch?v=FWhM3juUM6s