SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

Slide 1

Slide 1 text

SimCLR: A Simple Framework for Contrastive Learning of Visual Representation ੉઱ഘ (ML Research Scientist @ Pingpong)

Slide 2

Slide 2 text

A Simple Framework for Contrastive Learning of Visual Representation Overview • “A Simple Framework for Contrastive Learning of Visual Representation” • Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton • Google Research & Brain • ICML 2020 • Contrastive Learningਸ ਤೠ Frameworkਸ ઁউೞҊ, п ਃࣗ੄ ੄޷৬ ӝৈܳ ࠙ࢳೣ • ImageNet (Linear evaluation)ীࢲ State-of-the-art ࢿמਸ ࠁ੐

Slide 3

Slide 3 text

1. Introduction A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020)

Slide 4

Slide 4 text

Self-Supervised Learning (SSL) 1. Introduction • ਃ્ ઁੌ ೥ೠ ఃਕ٘ ઺ ೞա (ౠ൤ Computer Visionীࢲ) • ҙ۲ ఃਕ٘: Unsupervised Learning, Representation(Embedding) Learning, Contrastive Learning, Augmentation • Pretext Taskী ؀೧ࢲ Supervised Learning୊ۢ Objective Functionਵ۽ ೟णೣ • Pretext Task: Unlabeled Data۽ࠗఠ ٜ݅যմ Inputҗ Labelਸ ੉ਊೠ Predictive Task

Slide 5

Slide 5 text

Pretext Task 1. Introduction (a) Relative Path Prediction  (Doersch et al., 2015) (b) Jigsaw Puzzle (Noroozi et al., 2016) (d) Rotation Prediction  (Gidaris et al., 2018) (c) Colorization (Larsson et al., 2017)

Slide 6

Slide 6 text

Pretext Task 1. Introduction (a) Masked Language Modeling (b) Next Sentence Prediction (c) Language Modeling  (auto-regressive)

Slide 7

Slide 7 text

Supervised / Unsupervised / Self-supervised 1. Introduction • Supervised Learning: • ࢎۈ੉ ҙৈೞৈ Target Taskী ٮۄ Inputী ؀ೠ Labelਸ ٜ݅Ҋ ੉ܳ ೟णೣ (e.g. text classiﬁcation) • Unsupervised Learning: • ؘ੉ఠ ࢚ী Label੉ ઓ੤ೞ૑ ঋҊ ೟णী ࢎਊೞ૑ب ঋ਺ (e.g. Clustering, Auto-encoder, GAN) • Self-supervised Learning: • Unlabeled ؘ੉ఠ۽ࠗఠ Inputҗ Labelਸ ੗زਵ۽ ٜ݅যࢲ, Supervised Learning୊ۢ ೟णೣ

Slide 8

Slide 8 text

Contrastive Learning 1. Introduction • Example Pairо ਬࢎೠ૑ ইצ૑ ݏ୶ח ޙઁ • Example Pair੄ Latent Space ࢚੄ Ѣܻܳ  ਬࢎೞݶ оӰѱ, ׮ܰݶ ݣѱ Representationਸ ೟णदఃח ߑध • Metric Learning੄ ੌઙਵ۽ࢲ,   Example੄ ౠ૚җ ҙ҅ܳ ੜ ಴അೞب۾ ೟णदఃח ѱ ೨ब

Slide 9

Slide 9 text

Key Points of Contrastive Learning 1. Introduction 1.Example of similar and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?

Slide 10

Slide 10 text

Key Points of Contrastive Learning 1. Introduction 1.Example of similar and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо?

Slide 11

Slide 11 text

Slide 12

Slide 12 text

Key Points of Contrastive Learning 1. Introduction 1.Example of similar and dissimilar images • যڌѱ ਬࢎೞѢա ׮ܲ Example Pairܳ ٜ݅ Ѫੋо? 2. Ability to know what an image represents • যڌѱ Representationਸ ੜ ٜ݅ Ѫੋо? 3. Ability to quantify if two images are similar • যڌѱ ਬࢎೠ ੿بܳ ബҗ੸ਵ۽ ஏ੿ೡ Ѫੋо?

Slide 13

Slide 13 text

2. SimCLR A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020)

Slide 14

Slide 14 text

2. SimCLR SimCLR

Slide 15

Slide 15 text

2. SimCLR SimCLR - 1) Data Augmentation • ׮਺੄ 3о૑ Augmentation ӝߨਸ ےؒೞѱ ੸ਊ • Random Crop (with ﬂip and resize) • Color Distortion • Gaussian Blur

Slide 16

Slide 16 text

2. SimCLR SimCLR - 2) Encoder • ח Representation ܳ ݅٘ח Encoder Network • Encoderח ResNetਸ ࢎਊೣ (׮ܲ Ѫب ࢎਊ оמ) • f( ⋅ ) h h = f(˜ x) = ResNet(˜ x), where h ∈ ℝd

Slide 17

Slide 17 text

2. SimCLR SimCLR - 3) Projection • ח Representation ܳ Latent Space۽ ࢎ࢚ೞח Projection Network • 2-layer Nonlinear MLP (fully connected) ܳ ࢎਊೣ • g( ⋅ ) h z = g(h) = W(2)σ(W(1)h)

Slide 18

Slide 18 text

2. SimCLR SimCLR - 4) Loss • In-batch Negativesܳ ੉ਊ೧ࢲ Cross Entropy ೟ण • ੿ഛ൤ח Normalized temperature-scaled Cross Entropy (NT-Xent) • Batch ղ੄ ѐ੄ imageܳ п 2о૑ ߑधਵ۽ Augmentation ೞݶ ୨ ѐ੄ imageܳ ঳ਸ ࣻ ੓਺ (ਗࠄ image ࢎਊ উೣ) • زੌೠ ਗࠄ imageܳ ыח image ह ח positive,  ୹୊о ׮ܲ աݠ૑ imageٜҗ੄ ह ח negative • ૊, ੹୓ ѐ੄ ह ઺ীࢲ positiveܳ ଺ח धਵ۽ ೟ण N 2N (zi , zj ) (zi , zk ) 2N − 1

Slide 19

Slide 19 text

2. SimCLR SimCLR - Summary attract attract repel

Slide 20

Slide 20 text

2. SimCLR SimCLR - Algorithm

Slide 21

Slide 21 text

2. SimCLR Augmentation SimCLR - Algorithm

Slide 22

Slide 22 text

2. SimCLR Encoding Representation SimCLR - Algorithm

Slide 23

Slide 23 text

2. SimCLR Nonlinear Projection SimCLR - Algorithm

Slide 24

Slide 24 text

2. SimCLR Similarity & Loss SimCLR - Algorithm

Slide 25

Slide 25 text

2. SimCLR Training Details 1.Batch Size • 256ࠗఠ 8192ө૑ ׮নೞѱ प೷ • Batch sizeо ੌ ٸ, ੉ Negativesפө ୭؀ 16382ѐ 2.LARS Optimizer • Batch sizeо ழ૑ݶ Learning rateب ழઉঠ ೞחؘ, ੉۞ݶ ೟ण੉ ࠛউ੿೧૗ (Goyal et al., 2017) • Large batch sizeীࢲ উ੿੸ੋ ೟णਸ ਤ೧ LARS Optimizerܳ ࢎਊ N 2(N − 1)

Slide 26

Slide 26 text

2. SimCLR Training Details 3.Global Batch Normalization • ࠙࢑ ೟णਸ ೡ ٸ, BN੄ mean, varianceח п device ߹۽ ҅࢑ؽ (aggregated locally per device) • Positive pairח ೦࢚ э਷ deviceীࢲ ҅࢑ೞחؘ, ੉۽ ੋ೧ જ਷ representationਸ ݅٘ח ߑೱ੉ ইצ local ੿ࠁܳ ӝ߈ਵ۽ ৘ஏ ࢿמ݅ਸ ֫੉ח ߑೱਵ۽ ೟णೡ ࣻ ੓਺   → Global mean, varianceܳ ࢎਊ

Slide 27

Slide 27 text

2. SimCLR SimCLR for Downstream tasks

Slide 28

Slide 28 text

3. Experiments A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020)

Slide 29

Slide 29 text

3. Experiments Dataset & Evaluation Protocols • Dataset: ImageNet ILSVRC-2012 (Russakovsky et al., 2015) • Evaluation Protocols 1.Linear Evaluation • Linear Classiﬁer trained on learned features (frozen encoder) 2.Semi-supervised Learning • Fine-tune the model on few labels 3.Transfer Learning • Transfer Learning by ﬁne-tuning on other datasets

Slide 30

Slide 30 text

3. Experiments 1. Linear Evaluation

Slide 31

Slide 31 text

3. Experiments 2. Semi-supervised Learning • ੹୓ Labeled ILSVRC-2012 ೟ण ؘ੉ఠীࢲ 1%, 10%݅ ୶୹ೞৈ ೟ण ߂ ಣо (class-balanced way) • Class ׼ 1% ӝળ 12.8੢ ೟ण • ੉੹ state-of-the-art ؀࠺ 10% ࢿמ ೱ࢚ • ଵҊ۽ 100% ೟णदఃݶ training from scratch ࠁ׮ ࢿמ੉ ؊ য়ܴ (ResNet (4x) 78.4% / 94.2% → 80.4% / 95.4%)

Slide 32

Slide 32 text

3. Experiments 3. Transfer Learning

Slide 33

Slide 33 text

4. Discussion A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020)

Slide 34

Slide 34 text

4. Discussion Large Models • Unsupervised contrastive learning beneﬁts (more) from bigger models • ݽ؛ ௼ӝо ழ૕ ࣻ۾ ࢿמ੄ ੉ٙ੉ ੓਺ • Supervised ࠁ׮ Unsupervised Learning੉ ݽ؛ ௼ӝ ૐоী ٮۄ ೱ࢚ ಩੉ ؊ ఀ • Unsupervised Learning੉ ࢚؀੸ਵ۽ ؊ ௾ ੉ٙ

Slide 35

Slide 35 text

4. Discussion Nonlinear Projection • Projection Head ח যڃ ѱ ઁੌ જਸө? • ܳ ׮਺੄ 3о૑ ߑߨਵ۽ प೷ • Identity mapping • Linear projection • Nonlinear projection with one additional hidden layer • Nonlinear projection੉ Linear projectionࠁ׮ח 3%, No projection ࠁ׮ 10% ੉࢚ જ਺ (see Figure 8) g( ⋅ ) g( ⋅ )

Slide 36

Slide 36 text

4. Discussion Nonlinear Projection • Downstream taskী ࢎਊೡ Representationਵ۽ॄ,  Projection ੉੹੄ , ੉റ੄ , ޖ঺੉ ؊ જਸө? • ࠁ׮ о ഻ঁ ؊ જ਺ (see Table 3 & Figure B.4) • Why? • is trained to be invariant to data transformation • Contrastive loss੄ ౠࢿ ࢚ ੉޷૑ ߸ജ(࢝, ഥ੹ ١)җ ޖҙೞѱ э਷ ੉޷૑ۄҊ ೟ण • ೞ૑݅ ੉۠ ੿ࠁח Downstream taskীࢲח ਬਊೡ ࣻ ੓਺ h z = g(h) z = g(h) h z = g(h)

Slide 37

Slide 37 text

4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • normalization (i.e. cosine similarity) along with temperature effectively weights different examples, and an appropriate temperature can help the model learn from hard negatives • Cross Entropyী ࠺೧ ׮ܲ Objectiveٜ਷ Negatives ࢎ੉ী ࢚؀੸ੋ য۰਑ਸ ಴അೡ ࣻ হ׮ח ױ੼੉ ੓਺ l2

Slide 38

Slide 38 text

4. Discussion Normalized Temperature-scaled Cross Entropy (NT-Xent) • Similarity ೣࣻ ߂ Scalingਸ ਤೠ Temperature termਸ যڌѱ ࢸ੿ೞח Ѫ੉ જਸө? • Cosine similarity vs. Dot product • Cosine similarity with о о੢ જ਺ • Contractive Accuracyח Dot productо о੢ જওਵա Linear Evaluation਷ য়൤۰ Cosine Similarityо જ਺ τ = 0.1

Slide 39

Slide 39 text

4. Discussion Batch Size and Training Time • Epochਸ ੘ѱ оઉт ٸח Batch Sizeо ௿ࣻ۾ ࢿמ੉ әѺೞѱ ೱ࢚ؽ • Supervised Learningҗ ׳ܻ Contrastive Learning਷ Batch Sizeо ௿ࣻ۾ ؊ ݆਷ Negative Exampleਸ ઁҕ ೞӝ ٸޙী ؊ ബҗ੸ਵ۽ ࣻ۴ೣ

Slide 40

Slide 40 text

5. Conclusion A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020)

Slide 41

Slide 41 text

Conclusion 5. Conclusion • SimCLRۄח рױೞݶࢲب ബҗ੸ੋ Framework for Contrastive Learningਸ ઁউೣ • ӝઓ੄ state-of-the-artҗ ࠺Үೞৈ ௼ѱ ࢿמਸ ೱ࢚दఇ • Representation Learningী ੓যࢲ ઺ਃೠ ਃٜࣗਸ प೷ਸ ా೧ࢲ ࠙ࢳೞৈ ݆਷ ੋࢎ੉౟ܳ ઁҕೣ • ઱ഘ ࢤп • RRM, SSM ೟णী ੓যࢲ ੹߈੸ੋ ҳઑ ߂ Nonlinear projection, NT-Xent ١ਸ ੸ਊ೧ࠅ ࣻ ੓ਸ ٠ • Linear Evaluation э੉ Representationী ؀ೠ ಣо ߑߨ੉ ੓ਵݶ જਸ ٠ (see Table 5; ֤ޙ п?) • (ডр ୶࢚੸੐) Similarity Modelਸ Self-supervisionਵ۽ ٜ݅ ߑߨ੉ ઑӘ ࢤп೧ࠁݶ ৷૑ ੓ਸ Ѫ э਺

Slide 42

Slide 42 text

хࢎ೤פ׮✌ ୶о ૕ޙ ژח ҾӘೠ ੼੉ ੓׮ݶ ঱ઁٚ ইې োۅ୊۽ োۅ ઱ࣁਃ! ੉઱ഘ (ML Research Scientist @ Pingpong) Email. [email protected] Facebook. @roomylee Linked in. @roomylee

Slide 43

Slide 43 text

Reference A Simple Framework for Contrastive Learning of Visual Representation (Chen et al., 2020) • Papers • (SimCLR) A Simple Framework for Contrastive Learning of Visual Representations: https://arxiv.org/abs/2002.05709 • Ofﬁcial Repository: https://github.com/google-research/simclr • (SimCLR v2) Big Self-Supervised Models are Strong Semi-Supervised Learners: https://arxiv.org/abs/2006.10029 • (MoCo) Momentum Contrast for Unsupervised Visual Representation Learning: https://arxiv.org/abs/1911.05722 • (MoCo v2) Improved Baselines with Momentum Contrastive Learning: https://arxiv.org/abs/2003.04297 • Other Materials • SimCLR Slide by Google Brain: https://docs.google.com/presentation/d/1ccddJFD_j3p3h0TCqSV9ajSi2y1yOfh0-lJoK29ircs/edit? usp=sharing • The Illustrated SimCLR Framework: https://amitness.com/2020/03/illustrated-simclr/ • PR-231 Video: https://www.youtube.com/watch?v=FWhM3juUM6s