Slide 1

Slide 1 text

WhitenedCSE: Whitening-based Contrastive Learning of Sentence Embeddings D1, Graduate School of Informatics, Nagoya University, Japan Hayato Tsukagoshi Wenjie Zhuo, Yifan Sun, Xiaohan Wang, Linchao Zhu, Yi Yang ACL 2023
 https://aclanthology.org/2023.acl-long.677/

Slide 2

Slide 2 text

•ରরֶश͸จຒΊࠐΈͷֶशʹ༗ޮ͕ͩ
 ෛྫಉ͕࢜཭ΕΔΑ͏ʹ͢Δಇ͖͕ऑ͍ • ຒΊࠐΈͷҟํੑͷ໰୊ʹͭͳ͕Δ •ന৭Խॲཧ͸༗๬͕ͩରরֶशͱͷ૬ੑ͸ະ஌ •ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ
 จຒΊࠐΈֶशख๏ΛఏҊ • ຒΊࠐΈΛάϧʔϓʹࡉ෼Խͯ͠ന৭Խ • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश ֓ཁ 2

Slide 3

Slide 3 text

•ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 3 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]

Slide 4

Slide 4 text

•ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 4 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...] ҙຯతʹྨࣅ ͍ۙҙຯΛ࣋ͭจ͸ ۙ͘ʹ෼෍ ϕΫτϧؒͷڑ཭͕
 ҙຯతͳؔ܎Λදݱ

Slide 5

Slide 5 text

ಋೖ: Contrastive Learning / ରরֶश •ਖ਼ྫͱෛྫͷಛ௃දݱΛϞσϧ͔Βग़ྗ •ਖ਼ྫಉ࢜ͷྨࣅ౓͕ߴ͘ͳΔΑ͏ʹֶशΛߦ͏ •Computer Vision෼໺ͰେਓؾɺNLPͰ΋ྲྀߦத SimCLR •ಉ͡ը૾ʹରͯ͠ҟͳΔ
 data augmentationΛͨ͠
 ը૾ಉ࢜Λਖ਼ྫʹ͢Δ •ޙஈͷը૾෼ྨλεΫͳͲ
 Ͱߴ͍ੑೳ •CVʹ͓͚ΔදݱֶशͷͨΊ
 ͷpre-trainingͱͯ͠༗ޮ 5 ը૾͸ϒϩά[16]ΑΓҾ༻ Oord+: Representation Learning with Contrastive Predictive Coding, arXiv ‘18 Chen+: A Simple Framework for Contrastive Learning of Visual Representations, ICML ’20 Advancing Self-Supervised and Semi-Supervised Learning with SimCLR, ’20 Chen+: Big Self-Supervised Models are Strong Semi-Supervised Learners, NeurIPS ’20

Slide 6

Slide 6 text

•ྨࣅ౓ߦྻΛܭࢉɺର֯੒෼(ਖ਼ྫಉ࢜ͷྨࣅ౓)Λਖ਼ղͱ͢Δ • ର֯੒෼ͷྨࣅ౓࠷େԽ == ਖ਼ྫಉ࢜ͷྨࣅ౓࠷େԽ in-batch negatives •mini-batch಺ͷ͋Δࣄྫʹ͍ͭͯɺଞͷࣄྫΛෛྫͱͯ͠ߟ͑Δ ଛࣦؔ਺ (InfoNCE) Contrastive Learning / ֶशखॱ 6 ℒi = − log esim(hi ,h+ i )/τ ∑N j=1 esim(hi ,h+ j )/τ

Slide 7

Slide 7 text

•ಛ௃දݱͷྑ͞ΛଌΔ (ඍ෼Մೳͳ) ࢦඪ (ଛࣦؔ਺) • ͜ΕΒ͕ྑ͍(௿͍)΄Ͳྑ͍දݱ(ͱ͞Ε͍ͯΔ) Alignment •ࣅͨαϯϓϧ͕ಛ௃্ۭؒͰۙ͘ʹ෼෍ͯ͘͠ΕΔ͔ Uniformity •ಛ௃දݱ͕୯Ґ௒ٿ໘্ʹҰ༷෼෍͢Δ͔ Wang+: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML ’20 Alignment / Uniformity 7

Slide 8

Slide 8 text

•จຒΊࠐΈ+ରরֶशͷύΠΦχΞతݚڀ •Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ •Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP 2021 ઌߦݚڀ: SimCSE 8

Slide 9

Slide 9 text

Unsupervised SimCSE:ʮਖ਼ଇԽ+จຒΊࠐΈಉ࢜Λ཭͢ʯ Supervised SimCSE:ʮҙຯతʹ͍ۙจຒΊࠐΈΛ͚ۙͮΔ+ͦͷଞͷจຒΊࠐΈಉ࢜Λ཭͢ʯ SimCSE: Ϟνϕʔγϣϯ 9

Slide 10

Slide 10 text

•σʔλಉ͕࢜௒ٿ໘্ʹҰ༷෼෍͢ΔΑ͏ม׵ • ฏۉ: 0 • ෼ࢄڞ෼ࢄߦྻ: ୯Ґߦྻ ന৭Խʹ࢖͏ख๏ •Principal Component Analysis (PCA): σʔλߦྻΛݻ༗஋෼ղ •Zero-phase Component Analysis (ZCA): PCA + ճసଧͪফ͠ ҟํੑ (anisotropy) •σʔλ͕ߴ࣍ݩ্ۭؒͷ௿࣍ݩۭؒͷΈʹ෼෍ͯ͠͠·͏ੑ࣭ • ന৭Խ͸ҟํੑͷղফʹ༗༻ (౳ํతʹ෼෍ͨ͠ํ͕ੑೳ͕ྑ͍͜ͱ͕ଟ͍) ന৭Խ 10 H = WZ HHT = I
 WZ(WZ)T = WZZTWT ZZT = UΛUT WPCA = Λ−1/2UT
 WZCA = UΛ−1/2UT

Slide 11

Slide 11 text

•SimCSEͰ΋alignment / uniformity͸޲্͍ͯ͠Δ͕… • ରরֶशͩͱෛྫಉ࢜Λ཭ͤͳ͍ Shu ff l ed Group Whitening (SGW) •γϟοϑϧͯ͠άϧʔϓ͝ͱʹന৭Խ •ݩͷॱ൪ʹ໭ͯ͠ग़ྗຒΊࠐΈͱ͢Δ Multi-Positive Contrastive Loss •SGWΛෳ਺ճ܁Γฦͯ͠ਖ਼ྫΛਫ૿͠ •ଟ༷ͳਖ਼ྫͰֶशͰ͖ؤ݈ੑ޲্ WhitenedCSE 11

Slide 12

Slide 12 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 12 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py

Slide 13

Slide 13 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 13 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py γϟοϑϧͯ͠
 άϧʔϓ෼͚

Slide 14

Slide 14 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 14 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py ฏۉΛ0ʹ

Slide 15

Slide 15 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 15 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py (G, d, B) x (G, B, d) → (G, d, d)

Slide 16

Slide 16 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 16 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py ෼ࢄڞ෼ࢄߦྻΛ
 ݻ༗஋෼ղ

Slide 17

Slide 17 text

Shu ffl ed Group Whitening: ٙࣅίʔυ 17 https://github.com/SupstarZh/WhitenedCSE/blob/master/whitenedcse/shu ffl ed_group_whitening.py ॱ൪Λݩʹ໭͢

Slide 18

Slide 18 text

λεΫ: จຒΊࠐΈͷඪ४తͳϕϯνϚʔΫ •Semantic Textual Similarity (STS) •SentEval ௥ՃͷධՁࢦඪ •Uniformity, Alignment ܇࿅ઃఆ •ӳޠWikipedia͔ΒϥϯμϜʹαϯϓϦϯάͨ͠100ສจ (ϥϕϧͳ͠) •BERT, RoBERTaΛ fi ne-tuning •ന৭Խ͸όον͝ͱʹɺ384άϧʔϓ(1άϧʔϓ͋ͨΓ2࣍ݩ) (BERT) ࣮ݧ 18

Slide 19

Slide 19 text

•จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ •ਓखධՁͱϞσϧ͕ܭࢉͨ͠ྨࣅ౓
 ͷ૬ؔ܎਺ͰධՁ • Pearsonͷ(ੵ཰)૬ؔ܎਺ • SpearmanͷॱҐ૬ؔ܎਺ •จຒΊࠐΈධՁͰ͸ڭࢣͳ͠ઃఆ • STSσʔλΛ༻ֶ͍ͨश͸͠ͳ͍ • ࣄલʹ܇࿅͞ΕͨϞσϧΛධՁ Semantic Textual Similarity (STS) 19

Slide 20

Slide 20 text

ڭࢣͳ͠STSͷධՁखॱ ᶃ ύϥϝʔλΛݻఆͨ͠
 จຒΊࠐΈϞσϧΛ༻ҙ ᶄ จϖΞͦΕͧΕΛจຒΊࠐΈʹม׵ ᶅ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ • ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ ᶆ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ •૬ؔ܎਺͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ” ڭࢣͳ͠ (Unsupervised) STSλεΫ 20 จA จB ᶄ ᶅ ᶆ ᶃ จຒΊࠐΈϞσϧ ਓखධՁͱͷ
 ૬ؔ܎਺ͰධՁ จྨࣅ౓

Slide 21

Slide 21 text

•ςΩετ෼ྨͳͲͷԼྲྀλεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18 SentEval 21 Task ෼ྨର৅ Ϋϥε਺ ྫจ MR өըϨϏϡʔͷpos/neg 2 Too slow for a younger crowd, too shallow for an older one. CR ঎඼ϨϏϡʔͷpos/neg 2 We tried it out christmas night and it worked great. SUBJ өը/͋Β͢͡ͷओ؍ੑ 2 A movie that doesn’t aim too high, but doesn’t need to. MPQA ϑϨʔζͷۃੑ 2 would like to tell SST-2 өըϨϏϡʔͷpos/neg 2 Audrey Tautou has a knack for picking roles that magnify her [..] TREC ࣭໰ͷछผ 6 What are the twin cities? MRPC 2จ͕ݴ͍׵͔͑Ͳ͏͔ 2 The procedure is generally performed in the second or third trimester. & The technique is used during the second and, occasionally, third trimester of pregnancy.

Slide 22

Slide 22 text

SentEvalͷධՁखॱ ᶃ ύϥϝʔλΛݻఆͨ͠
 จຒΊࠐΈϞσϧΛ༻ҙ ᶄ ֤จΛจຒΊࠐΈʹม׵ ᶅ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶆ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ
 ͷ඼࣭ΛධՁ •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ” •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍ SentEval 22 จ ᶄ ᶃ ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ ᶅ จຒΊࠐΈϞσϧ ෼ྨث ᶆ

Slide 23

Slide 23 text

BERT- fl ow: ҟํతͳBERTͷจຒΊࠐΈۭ͔ؒΒ౳ํతͳજࡏۭؒ΁ͷࣸ૾Λֶश BERT-whitening: จຒΊࠐΈͷฏۉ͕0ɼڞ෼ࢄߦྻ͕୯ҐߦྻʹͳΔΑ͏ʹઢܗม׵ (+࣍ݩ࡟ݮ) IS-BERT: จຒΊࠐΈͱจதͷn-gramͷຒΊࠐΈͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹֶश BERT-CT: ҟͳΔೋͭͷಉ͡Ϟσϧͷಉ͡จʹର͢ΔຒΊࠐΈಉ࢜ͷ಺ੵ͕େ͖͘ͳΔΑ͏ʹֶश SimCSE: ҟͳΔDropoutΛద༻ͨ͠ಉ͡จΛਖ਼ྫ or ؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश MixCSE: ҟͳΔจΛࠞͥͨจΛhard negativeͱͯ͠ڭࢣͳ͠ରরֶश ArcCSE: ؚҙϖΞจຒΊࠐΈͷmargin෇͖֯౓࠷খԽ+DAͨ͠จΛෛྫʹ͢ΔTriplet Lossͷ༥߹ DCLR: Ψ΢γΞϯϊΠζΛෛྫͱͯ͠௥Ճ + ࣄྫ͝ͱॏΈ෇͚ͯ͠Unsup-SimCSE MoCoSE: ϞʔϝϯλϜΤϯίʔμͷ࠷దͳෛྫ਺෼ੳ+FGSMʹΑΔσʔλ֦ுͰରরֶश Li+: On the Sentence Embeddings from Pre-trained Language Models, EMNLP '20 Su+: Whitening Sentence Representations for Better Semantics and Faster Retrieval, arXiv ’21 Zhang+: An Unsupervised Sentence Embedding Method by Mutual Information Maximization, EMNLP ’20 Carlsson+: Semantic Re-tuning with Contrastive Tension, ICLR ’21 Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 Zhang+: Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives, AAAI ’22 Zhang+: A Contrastive Framework for Learning Sentence Representations from Pairwise and Triple-wise Perspective in Angular Space, ACL ’22 Zhou+: Debiased Contrastive Learning of Unsupervised Sentence Representations, ACL 2022 Cao+: Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding, ACL fi ndings ’22 ൺֱख๏ 23

Slide 24

Slide 24 text

•BERT-base, BERT-largeͷ૒ํͰ࠷ߴੑೳ ࣮ݧ݁Ռ: STS / BERT 24

Slide 25

Slide 25 text

•BERT-base, BERT-largeͷ૒ํͰ࠷ߴੑೳ ࣮ݧ݁Ռ: STS / BERT 25 ൃදऀ஫: ίʔυΛಡΜͩײ͡STSͷจ1ͱจ2Λผʑʹന৭Խͯ͠͠·ͬͯΔΑ͏ʹݟ͑Δ ධՁ࣌ʹ΋ന৭Խ
 ͍ͯ͠Δ͔͸ෆ໌ྎ

Slide 26

Slide 26 text

•ඍ͕ࠩͩͪ͜ΒͰ΋࠷ߴੑೳ ࣮ݧ݁Ռ: SentEval 26

Slide 27

Slide 27 text

•ֶशॳظ͔ΒUniformityͷ஋͕͍͍ (lower is better) • ന৭ԽʹΑֶͬͯशॳظ͔ΒຒΊࠐΈ͕Ұ༷෼෍͢ΔͨΊ Alignment / Uniformity 27 Alignment Uniformity SimCSE WhitenedCSE

Slide 28

Slide 28 text

•ֶशޙͷຒΊࠐΈදݱΛՄࢹԽͨ͠΋ͷ •WhitenedCSEʹΑΔຒΊࠐΈ͕࠷΋Ұ༷ʹ෼෍͍ͯ͠Δ(ؾ͕͢Δ) WhitenedCSEʹΑΔຒΊࠐΈͷՄࢹԽ 28 BERT SimCSE WhitenedCSE

Slide 29

Slide 29 text

•άϧʔϓͷ਺Λ૿΍ͨ͠ํ͕(άϧʔϓ͝ͱ࣍ݩ਺Λখͨ͘͞͠ํ͕)ߴੑೳ • ΑΓϚΠϧυͳന৭Խ(ແ૬ؔԽ)ʹͳ͍ͬͯΔ •γϟοϑϧ͢Δ͜ͱͰ҆ఆతʹੑೳ޲্ •ςετηοτͷ݁ՌͰAblation͢Δͷ΍Ίͯ΄͍͠ Ablation: Group Size 29 Group Size͝ͱͷੑೳͷҧ͍

Slide 30

Slide 30 text

•ന৭Խख๏ʹΑͬͯੑೳ͸େ͖͘มԽ • Group-whitening͕͔ͳΓ༗ޮͦ͏ • γϟοϑϧ͢Δͱ͞Βʹྑ͘ͳΔ •ന৭Խͳ͠ͷ৔߹(άϧʔϓ෼͚ͯ͠ෳ਺ਖ਼ྫͷରরֶश)Ͱ΋ଟগੑೳ޲্ • ͕ɺSGWͱ૊Έ߹Θͤͨࡍͷվળ෯͕େ͖ͦ͏ Ablation: ന৭Խख๏ɾϞδϡʔϧ 30 ന৭Խख๏ Ϟδϡʔϧͷ༗ແ

Slide 31

Slide 31 text

•ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ
 จຒΊࠐΈֶशख๏ΛఏҊ • ຒΊࠐΈΛάϧʔϓʹ෼ׂͯ͠ന৭Խ (SGW) • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश ·ͱΊ 31

Slide 32

Slide 32 text

•ରরֶशͱന৭ԽॲཧΛ૊Έ߹Θͤͨ
 จຒΊࠐΈֶशख๏ΛఏҊ • ຒΊࠐΈΛάϧʔϓʹ෼ׂͯ͠ന৭Խ (SGW) • ෳ਺ͷਖ਼ྫΛ༻͍ͨରরֶश ײ૝ •ͿͬͪΌ͚ධՁ͕গ͠ո͍͠ •άϧʔϓԽͤͣʹന৭Խͨ͠৔߹ͷੑೳ͕݁ߏ௿͍ • ಛ௃දݱ͕drasticʹมԽ͗͢͠ΔͨΊʁ • άϧʔϓ͕খ͍͞ͷͰϚΠϧυͳന৭ԽΛ͍ͯ͠Δ •ϚΠϧυന৭ԽͰྑ͍ͳΒന৭Խૢ࡞ࣗମෆཁ͔΋ʁ • ෼ࢄڞ෼ࢄߦྻ͕IdenticalʹͳΔΑ͏ͳଛࣦ͸ʁ ·ͱΊ 32 ന৭Խͷ࢓ํ͕
 ؾʹͳΓ͗͢Δ