Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] SimCSE: Simple Contrastive Learning of Sentence Embeddings

Hayato Tsukagoshi
February 11, 2022
3.9k

[輪講資料] SimCSE: Simple Contrastive Learning of Sentence Embeddings

事前学習済み言語モデルと対照学習を用いて、非常にシンプルながら文埋め込み手法のState-of-the-Artを更新したSimCSEという手法について解説します。

Hayato Tsukagoshi

February 11, 2022
Tweet

Transcript

  1. SimCSE: Simple Contrastive Learning of

    Sentence Embeddings
    Tianyu Gao, Xingcheng, and Danqi Chen

    EMNLP 2021

    URL: https://aclanthology.org/2021.emnlp-main.552.pdf
    ൃදऀ: Hayato Tsukagoshi
    Graduate school of Informatics, Nagoya University, Japan.

    View Slide

  2. ࿦จ֓ཁ
    •Contrastive LearningΛ༻͍ͨจຒΊࠐΈख๏ SimCSE ΛఏҊ

    •ਖ਼ྫͷ࡞ΓํͰ Unsupervised SimCSE / Supervised SimCSE ͷ2छྨʹ෼͔ΕΔ

    Unsupervised SimCSE (unsup-SimCSE)
    •ಉ͡จʹରͯ͠ҟͳΔdropout maskΛద༻ͯ͠࡞੒ͨ͠ೋͭͷจຒΊࠐΈΛਖ਼ྫͱ͢Δ

    • ಉ͡จΛ2ճಉ͡Ϟσϧʹ௨͚ͩ͢

    •STSλεΫʹ͓͍ͯڭࢣͳ͠ख๏ͰSOTA

    • طଘͷڭࢣ͋ΓϕʔεϥΠϯ(Sentence-BERT)ͱಉ౳ੑೳ

    Supervised SimCSE (sup-SimCSE)
    •NLIσʔληοτͷؚҙ (entailment) ϖΞΛਖ਼ྫͱ͢Δ

    • hard negativeͱͯ͠ໃ६ (contradiction) ϖΞΛ༻͍ͯ͞Βʹੑೳ޲্

    •STSλεΫͰଞͷจຒΊࠐΈख๏Λେ্͖͘ճͬͯSOTA
    2

    View Slide

  3. ໨࣍
    •ಋೖ
    • จຒΊࠐΈ

    • Contrastive Learning / ରরֶश

    • Semantic Textual Similarity (STS) λεΫ

    •จຒΊࠐΈͷؔ࿈ݚڀ
    • ୯ޠຒΊࠐΈ͔ΒจຒΊࠐΈΛߏ੒͢Δख๏ / จྨࣅ౓ܭࢉख๏

    • BERTҎલͷจຒΊࠐΈϞσϧ/ख๏

    • BERTҎޙͷจຒΊࠐΈϞσϧ/ख๏

    •SimCSE
    • ࣄલௐࠪ

    • ख๏֓ཁ

    • ࣮ݧઃఆ / ࣮ݧ݁Ռ
    3

    View Slide

  4. ໔੹ࣄ߲ / උߟ
    •அΓ͕ͳ͍ݶΓਤද͸࿦จ͔ΒͷҾ༻Ͱ͢

    •εϥΠυ಺༰ͷޡΓ΍ؾʹͳΔ͜ͱ͕͋Δ৔߹͸ɺ@hayato_tkgs ·Ͱ͓ئ͍͍ͨ͠·͢

    • ͦͷଞͷࡶஊ΍ݚڀ/ਐ࿏૬ஊͳͲ΋େৎ෉Ͱ͢ɺͳΜͰ΋͝࿈བྷ͍ͩ͘͞

    •ؔ࿈ݚڀΛ঺հ͢Δઅ͕͋Γ·͕͢ɺͦͷઅ͸จຒΊࠐΈશൠͷؔ࿈ݚڀΛ঺հ͍ͯ͠·͢

    • SimCSE࿦จͰݴٴ͞Ε͍ͯΔؔ࿈ݚڀͷΈΛ֬ೝ͍ͨ͠৔߹͸ݪஶ࿦จΛ͋ͨͬͯΈͯ
    ͍ͩ͘͞

    • ޡͬͨཧղɺݴٴ͢΂͖ؔ࿈ݚڀͷݟಀؚ͕͠·ΕΔ͔΋͠Ε·ͤΜɻ܁Γฦ͠Ͱ͕͢ɺ
    ͦ͏ݴͬͨ৔߹͸͝ࢦఠ͍͚ͨͩΔͱخ͍͠Ͱ͢

    •🧐←͜ͷϚʔΫ͕͍͍ͭͯΔهड़͸චऀͷײ૝Ͱ͢
    4

    View Slide

  5. ಋೖ

    View Slide

  6. ಋೖ: จຒΊࠐΈ
    •จຒΊࠐΈ: จΛݻఆ௕ͷϕΫτϧͱͯ͠දݱ͢Δ
    • ҙຯతྨࣅ౓ܭࢉɺจॻ/࣭໰Ԡ౴ ݕࡧɺจ/จॻΫϥελϦϯάʹ༗༻

    •ݪཧతʹ͸ԿΛຒΊࠐΜͰ΋ྑ͍(ҙຯҎ֎ͷߏจ৘ใͳͲ)

    • ଟ͘ͷݚڀ͸จͷҙຯΛຒΊࠐΜͩϕΫτϧදݱʹfocus

    •ྨࣅ౓ܭࢉΛखܰʹ௿ίετͰ࣮ߦͰ͖Δ

    • ϕΫτϧۙ๣୳ࡧϥΠϒϥϦ[1, 2] Ͱߴ଎ʹྨࣅϕΫτϧΛܭࢉՄೳ(ۙࣅతʹ)

    •จຒΊࠐΈΛଞͷλεΫͷͨΊͷಛ௃ྔͱͯ͠࢖͏͜ͱ΋Մೳ

    ධՁࢦඪ
    •Semantic Textual Similarity (STS) λεΫ

    •ςΩετ෼ྨͳͲͷԼྲྀλεΫͰͷੑೳධՁ … ୅දྫ: SentEval [3]

    •ΫϥελϦϯάͷੑೳ(clustering accuracy)
    6
    [1] https://github.com/facebookresearch/faiss

    [2] https://github.com/nmslib/nmslib

    [3] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18

    View Slide

  7. ಋೖ: Semantic Textual Similarity (STS) λεΫ
    •จϖΞͱͦͷҙຯతྨࣅ౓͕෇༩͞ΕͨSTSσʔληοτΛ༻͍Δ

    •จϖΞͷҙຯతͳྨࣅ౓ΛܭࢉɺਓखධՁʹΑΔྨࣅ౓ͱͷ

    ૬ؔΛଌΔ͜ͱͰϞσϧ͕Ͳͷఔ౓

    จͷҙຯΛଊ͍͑ͯΔ͔ΛධՁ

    •จຒΊࠐΈͷධՁʹ͸Ұൠతʹ…

    • จຒΊࠐΈಉ࢜ͷίαΠϯྨࣅ౓Λ

    จͷҙຯྨࣅ౓ͱͯ͠ར༻

    • Unsupervisedઃఆ: STSσʔληοτͰͷ

    ֶश͸͠ͳ͍(ྨࣅ౓ͷճؼϞσϧͳͲ)

    • Ϟσϧͷྨࣅ౓ͱਓखධՁͷྨࣅ౓ͱͷ

    SpearmanͷॱҐ૬ؔ܎਺ΛଌΔ
    7
    STSσʔληοτ಺ͷ
    ࣮ࡍͷσʔλˠ

    View Slide

  8. ಋೖ: Semantic Textual Similarity (STS) λεΫ
    •STSλεΫʹ͸Ұൠతʹ STS12-16 [4-8], STS Benchmark [9], SICK-R [10] ͕༻͍ΒΕΔ

    • ͍ͣΕͷσʔληοτ΋จϖΞͱ࣮਺ͷҙຯతྨࣅ౓͕ϥϕϧ෇͚͞Ε͍ͯΔ

    • ҙຯతྨࣅ౓ͷൣғ͸ STS12-16, STS Benchmark ͕ 0-5, SICK-R ͕ 1-5

    • STS12-16͸test setͷΈɺSTS Benchmark ʹ͸ train / dev / test set ͕ଘࡏ

    •STS Benchmark dev setΛ࢖ͬͯϋΠύϥௐ੔͢Δ͜ͱ͕͋Δ

    • SimCSE͸ֶश཰ͳͲͷνϡʔχϯάͷ΄͔ɺධՁʹ࢖༻͢Δcheckpointͷબ୒ͷͨΊʹ
    ܇࿅த250step͝ͱʹධՁͯ͠࠷΋ྑ͍checkpointΛར༻

    •STSλεΫ͸ධՁख๏͕࿦จ͝ͱʹҟͳΔ͕࣌͋Γɺ஫ҙ͕ඞཁ

    • Spearman / Pearson ͷ(ॱҐ)૬ؔ܎਺ͷ ୯७ / ॏΈ෇͖ ฏۉΛ࢖͏…ͳͲ

    • SimCSE࿦จͷAppendix.Bʹهड़͕͋ΔͷͰҰಡΛਪ঑
    8
    [4] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12

    [5] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13

    [6] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14

    [7] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15

    [8] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16

    [9] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17

    [10] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ‘14

    View Slide

  9. ಋೖ: Semantic Textual Similarity (STS) λεΫ
    ิ଍
    •BERT͔ΒφΠʔϒʹநग़ͨ͠จຒΊࠐΈ͸STSʹ͓͚Δੑೳ͕௿͍͜ͱ͕஌ΒΕ͍ͯΔ[11]

    • BERTͷจ຺Խ୯ޠຒΊࠐΈͷฏۉ΍CLSͷϕΫτϧͳͲ

    • GloVe΍fastTextͳͲͷ੩తͳ୯ޠຒΊࠐΈͷฏۉΛͱͬͨํ͕BERTΑΓੑೳ͕͍͍
    •ҰํͰɺԼྲྀλεΫ(sentiment classi
    fi
    cationͳͲ)ʹ͓͚ΔBERT༝དྷͷจຒΊࠐΈͷੑೳ͸
    ͋Δఔ౓ߴ͍఺ʹ஫ҙ

    •BERTͳͲࣄલֶशࡁΈݴޠϞσϧͷຒΊࠐΈۭؒ͸ҟํੑ(anisotropy)Λ࣋ͪ[12]ɺ͜Ε͕
    STSλεΫͷੑೳʹѱӨڹΛ༩͍͑ͯΔՄೳੑ͕ࣔࠦ͞Ε͍ͯΔ[13]
    9
    [11] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP '19

    [12] Ethayarajh, How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings, EMNLP ’19

    [13] Li+: On the Sentence Embeddings from Pre-trained Language Models, EMNLP '20

    View Slide

  10. ಋೖ: Contrastive Learning / ରরֶश
    •ਖ਼ྫͱෛྫͷಛ௃දݱΛϞσϧ͔Βग़ྗ

    •ਖ਼ྫಉ࢜ͷྨࣅ౓͕ߴ͘ͳΔΑ͏ʹֶशΛߦ͏

    •Computer Vision෼໺ͰେਓؾɺNLPͰ΋ྲྀߦத

    SimCLR [15]
    •ಉ͡ը૾ʹରͯ͠ҟͳΔ

    data augmentationΛͨ͠

    ը૾ಉ࢜Λਖ਼ྫʹ͢Δ

    •ޙஈͷը૾෼ྨλεΫͳͲ

    Ͱߴ͍ੑೳ

    •CVʹ͓͚ΔදݱֶशͷͨΊ

    ͷpre-trainingͱͯ͠༗ޮ
    10
    ը૾͸ϒϩά[16]ΑΓҾ༻
    [14] Oord+: Representation Learning with Contrastive Predictive Coding, arXiv ‘18

    [15] Chen+: A Simple Framework for Contrastive Learning of Visual Representations, ICML ’20

    [16] Advancing Self-Supervised and Semi-Supervised Learning with SimCLR, ’20

    [17] Chen+: Big Self-Supervised Models are Strong Semi-Supervised Learners, NeurIPS ’20

    View Slide

  11. ಋೖ: Contrastive Learning / ֶशखॱ
    •mini-batch಺ͷ͋Δࣄྫʹ͍ͭͯɺଞͷࣄྫΛෛྫͱͯ͠ߟ͑Δ

    → in-batch negatives

    •(batch size x batch size) ͷྨࣅ౓ߦྻΛܭࢉɺର֯੒෼(ࣗ෼ࣗ਎)Λਖ਼ղͱ͢Δ

    • ର֯੒෼ͷྨࣅ౓࠷େԽ == ࣗ෼ࣗ਎ͱͷྨࣅ౓࠷େԽ

    •࣮૷తʹ͸Cross Entropy LossΛܭࢉ

    ↓ଛࣦؔ਺

    •ଟ͘ͷ৔߹ྨࣅ౓ͱͯ͠ίαΠϯྨࣅ౓Λ࢖͏

    •ਖ਼ྫͷ࡞Γํ͕ඇৗʹॏཁ

    • SimCLRͰ͸augmentationख๏ʹΑͬͯੑೳ͕มԽ
    11
    ↑ྨࣅ౓ߦྻɺ੨৭෦෼͕ਖ਼ղʹ͋ͨΔ
    ℒi
    = − log
    esim(hi
    ,h+
    i
    )/τ
    ∑N
    j=1
    esim(hi
    ,h+
    j
    )/τ
    ը૾ 1
    ը૾ 2
    ը૾ 3
    ը૾ 4
    ը૾ 5
    ը

    1’
    ը

    2’
    ը

    3’
    ը

    4’
    ը

    5’
    batch size

    View Slide

  12. [18] Wang+: Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere, ICML ’20
    ಋೖ: Contrastive Learning / Alignment, Uniformity
    •Contrastive Learningʹ͓͚Δಛ௃දݱ͕࣋ͭͱ޷·͍͠ੑ࣭͔Β

    ߟ͑ͯಛ௃දݱͷྑ͞ΛଌΔ(ඍ෼Մೳͳ)ࢦඪΛఏҊ [18]

    • lower is better
    Alignment
    •ࣅͨαϯϓϧ͕ಛ௃্ۭؒͰۙ͘ʹ෼෍ͯ͘͠ΕΔ͔

    Uniformity
    •ಛ௃දݱ͕Ͳͷఔ౓ಛ௃্ۭؒͷ

    ୯Ґ௒ٿ໘্ʹҰ༷෼෍͢Δ͔
    12
    ਤ͸࿦จ͔ΒҾ༻

    View Slide

  13. จຒΊࠐΈͷؔ࿈ݚڀ

    View Slide

  14. ؔ࿈ݚڀ: ୯ޠຒΊࠐΈˠจຒΊࠐΈ / จྨࣅ౓ܭࢉख๏
    p-mean: จຒΊࠐΈΛ୯ޠຒΊࠐΈͷ Ͱܭࢉ [19]

    SWEM: ୯ޠຒΊࠐΈΛ ฏۉ / max / ฏۉͱmaxͷconcat / ہॴ૭͝ͱʹฏۉ͔ͯ͠Βmax ΛͱΔ [20]

    GEM: จதͷ୯ޠຒΊࠐΈͷ௚ߦجఈΛ΋ͱʹnoveltyͳͲॏΈΛܭࢉͯ͠୯ޠຒΊࠐΈΛॏΈ෇͚࿨ [21]
    DynaMax: ೋͭͷจͷ୯ޠຒΊࠐΈΛstackͨ͠ߦྻΛ࡞ΓFuzzy setͷߟ͑ΛݩʹFuzzy Jaccard܎਺Λܭࢉ [22]
    SIF: ΛܭࢉˠຒΊࠐΈߦྻΛಛҟ஋෼ղˠୈҰಛҟϕΫτϧ Ͱ Λܭࢉ [23]
    uSIF: ෳ਺ͷಛҟϕΫτϧΛར༻ɺಛҟ஋ͷ૯࿨Λ࢖͏ϋΠύϥௐ੔ෆཁͳSIF [24]
    P-SIF: ୯ޠͷτϐοΫϕΫτϧΛ࢖ͬͨSIF [25]

    All-but-the-Top: ୯ޠຒΊࠐΈͷू߹ΛPCA্ͯ͠Ґओ੒෼Λআ͘ [26]
    (
    xp
    1
    + xp
    1
    + . . . + xp
    n
    n )
    1
    p
    1
    |s| ∑
    w∈s
    a
    a + p(w)
    vw
    u vs
    − uuTvs
    14
    [19] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18

    [20] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18

    [21] Yang: Parameter-free Sentence Embedding via Orthogonal Basis, EMNLP-IJCNLP '19

    [22] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR '19

    [23] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17

    [24] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP '18

    [25] Gupta+: P-SIF: Document Embeddings Using Partition Averaging, AAAI '20

    [26] Mu+: All-but-the-Top: Simple and E
    ff
    ective Postprocessing for Word Representations, ICLR '18

    View Slide

  15. ؔ࿈ݚڀ: ୯ޠຒΊࠐΈˠจຒΊࠐΈ / จྨࣅ౓ܭࢉख๏
    Word Mover's Distance: จ௕ͷٯ਺ΛҰ༷ͳ֬཰࣭ྔɼίετΛϢʔΫϦουڑ཭ͱͯ͠࠷ద༌ૹ [27]

    Word Mover's Embedding: จ(ॻ)ͱαϯϓϦϯάͨ͠ෳ਺ͷจ(ॻ)ͱͷWMDͷྻΛจ(ॻ)ϕΫτϧͱ͢Δ [28]
    Word Rotator's Distance: ୯ޠຒΊࠐΈͷϊϧϜΛ֬཰࣭ྔɼίετΛίαΠϯྨࣅ౓ͱͯ͠࠷ద༌ૹ [29]
    15
    [27] Kusner+: From Word Embeddings To Document Distances, ICML '15

    [28] Wu+: Word Mover's Embedding: From Word2Vec to Document Embedding, EMNLP '18

    [29] Yokoi+: Word Rotator’s Distance, EMNLP '20

    View Slide

  16. จຒΊࠐΈͷؔ࿈ݚڀ: BERTҎલ
    16
    SkipThought: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹBooks CorpusͰLSTMͷencoder-decoderϞσϧΛֶश [30]
    FastSent: લޙͷจͷBag-of-WordsΛ࠶ߏ੒͢ΔΑ͏ʹSkip-gramతͳϞσϧΛֶश [31]

    SDAE: ϊΠζͷՃΘͬͨೖྗจ͔ΒϊΠζΛআڈͨ͠จΛ࠶ߏ੒Ͱ͖ΔΑ͏ʹLSTMΛֶश [31]

    SCDV: ୯ޠຒΊࠐΈΛΫϥελϦϯάˠτϐοΫΛߟྀͨ͠୯ޠຒΊࠐΈͱεύʔεͳจຒΊࠐΈΛ֫ಘ [32]
    QuickThought: ࣍ͷจΛਖ਼ྫɼͦͷଞͷจΛෛྫͱͯ͠GRUΛରরֶश [33]

    Sent2Vec: n-gramΛߟྀͨ͠CBOWΛֶश [34]

    InferSent*: NLI෼ྨͰLSTMΛֶश [35]

    Universal Sentence Encoder: DAN/TransformerΛNLIσʔληοτͰSkipThoughtతʹڭࢣͳֶ͠श [36]
    [30] Kiros+: Skip-Thought Vectors, NIPS '15

    [31] Hill+: Learning Distributed Representations of Sentences from Unlabelled Data, NAACL ’16

    [32] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17

    [33] Logesmaran+: An e
    ff
    i
    cient framework for learning sentence representations, ICLR ’18

    [34] Pagliardini+:, Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features, NAACL ’18

    [35] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17

    [36] Cer+: Universal Sentence Encoder, arXiv, Mar 2018
    *͸ڭࢣ͋Γֶश

    View Slide

  17. จຒΊࠐΈͷؔ࿈ݚڀ: BERTҎޙ / Post-processing
    BERT-
    fl
    ow: ҟํతͳBERTͷจຒΊࠐΈۭ͔ؒΒ౳ํతͳજࡏۭؒ΁ͷࣸ૾Λֶश [37]
    BERT-whitening: จຒΊࠐΈͷฏۉ͕0ɼڞ෼ࢄߦྻ͕୯ҐߦྻʹͳΔΑ͏ʹઢܗม׵(+࣍ݩ࡟ݮ) [38]
    WhiteningBERT: จຒΊࠐΈͷฏۉ͕0ɼڞ෼ࢄߦྻ͕୯ҐߦྻʹͳΔΑ͏ʹઢܗม׵(w/ ͍ΖΜͳϞσϧ) [39]

    SBERT-WK: BERT/SBERTͷ૚ผͷಛ௃ྔΛ༻͍ͯܭࢉͨ͠noveltyͳͲͷ஋Λ΋ͱʹจຒΊࠐΈΛߏ੒ [40]
    17
    [37] Li+: On the Sentence Embeddings from Pre-trained Language Models, EMNLP '20

    [38] Su+: Whitening Sentence Representations for Better Semantics and Faster Retrieval, arXiv '21

    [39] Huang+: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach, arXiv ’21

    [40] Wang+: SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models, IEEE/ACM Transactions on Audio, Speech, and Language Processing ’20
    ͢΂ͯͷख๏͕Ϟσϧͱͯ͠BERT, Sentence-BERTΛϕʔεʹ͓ͯ͠Γɺ
    fi
    ne-tuning͸͠ͳ͍

    View Slide

  18. จຒΊࠐΈͷؔ࿈ݚڀ: BERTҎޙ / Unsupervised
    IS-BERT: จຒΊࠐΈͱจதͷn-gramͷຒΊࠐΈͷ૬ޓ৘ใྔΛ࠷େԽ͢ΔΑ͏ʹֶश [41]
    ˑDeCLUTR: ಉ͡จॻͷҟͳΔεύϯಉ࢜Λਖ਼ྫͱͯ͠ରরֶश [42]
    ˑBERT-CT: ҟͳΔೋͭͷಉ͡Ϟσϧͷɼಉ͡จʹର͢ΔจຒΊࠐΈಉ࢜ͷ಺ੵ͕େ͖͘ͳΔΑ͏ʹֶश [43]
    ˑConSERT: ݩͷจΛਖ਼ྫͱͯ͠ɼఢରతઁಈ / ୯ޠɾಛ௃ྔ࡟আ / γϟοϑϧ / Dropoutͯ͠ରরֶश [44]

    ˑSG/SG-OPT: จຒΊࠐΈͱBERTͷதؒ૚ͷຒΊࠐΈΛ͚ۙͮΔΑ͏ʹରরֶश [45]
    ˑCLEAR: ݩͷจΛਖ਼ྫͱͯ͠ɼ୯ޠɾεύϯ࡟আ / ޠॱೖΕସ͑ / ಉٛޠஔ׵ͯ͠ࣄલֶश࣌ʹରরֶश [46]

    ˑCOCO-LM: ୯ޠஔ׵ͨ͠จͱ୯ޠ࡟আͨ͠จಉ࢜Λਖ਼ྫͱͯ͠ରরֶश [47]
    TSDAE**: SDAEͷTransformer൛, จΛ࠶ߏ੒ͯ͠จຒΊࠐΈϞσϧΛֶश [48]
    18
    [41] Zhang+: An Unsupervised Sentence Embedding Method by Mutual Information Maximization, EMNLP '20

    [42] Giorgi+: DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations, ACL '21

    [43] Carlsson+: Semantic Re-tuning with Contrastive Tension, ICLR '21

    [44] Yan+: ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer, ACL ’21

    [45] Kim+: Self-Guided Contrastive Learning for BERT Sentence Representations, ACL ’21

    [46] Wu+: CLEAR: Contrastive Learning for Sentence Representation, arXiv '20

    [47] Meng+: COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining, NeurIPS ‘21

    [48] Wang+:, TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, EMNLP
    fi
    ndings '21
    ˑ͸Contrastive LearningΛ༻͍ͨख๏

    **͸ࣄલֶशࡁΈݴޠϞσϧͷ
    fi
    ne-tuningͰ͸ͳ͘εΫϥονͰϞσϧΛֶश͢Δ

    View Slide

  19. [11] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP ’19

    [49] Thakur+: Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks, NAACL ’21

    [50] Tsukagoshi+: DefSent: Sentence Embeddings using De
    fi
    nition Sentences, ACL ’21

    [51] Zhang+: Pairwise Supervised Contrastive Learning of Sentence Representations, EMNLP ’21

    [58] Jiang+: PromptBERT: Improving BERT Sentence Embeddings with Prompts, arXiv ‘22
    จຒΊࠐΈͷؔ࿈ݚڀ: BERTҎޙ / Supervised
    Sentence-BERT: จຒΊࠐΈΛϕʔεʹͨ͠NLI෼ྨͰBERTΛ
    fi
    ne-tuning [11]
    Augmented SBERT: Cross-encoderͰྨࣅ౓ϥϕϧΛ෇༩ٖͯ͠ࣅσʔληοτͰBi-encoderΛֶश [49]

    DefSent: ࣄલֶशࡁΈݴޠϞσϧͷ୯ޠ༧ଌ૚Λར༻ɺఆٛจͷຒΊࠐΈ͔ΒͦΕ͕ද͢୯ޠΛ༧ଌ [50]

    ˑPairSupCon※: ؚҙؔ܎ͷจϖΞͷจຒΊࠐΈ͕ۙ͘ͳΔΑ͏ͳରরֶश+NLI෼ྨͷಉֶ࣌श [51]

    ˑPromptBERT***: ؚҙؔ܎ͷຒΊࠐΈ͕ۙ͘ͳΔΑ͏ͳରরֶश+ςϯϓϨʔτ༝དྷͷϊΠζআڈ [58]
    19
    ˑ͸Contrastive LearningΛ༻͍ͨख๏

    ※ SimCSEͱඇৗʹྨࣅɺ࿦จதͰSimCSEͷ͜ͱΛconcurrent workͱͯ͠Ҿ༻

    ***͸SimCSEΑΓޙൃͷݚڀɺSimCSEΑΓߴ͍ੑೳΛୡ੒ͨ͠ͷͰཁνΣοΫ

    View Slide

  20. ఏҊख๏: SimCSE

    View Slide

  21. SimCSE: Simple Contrastive Sentence Embedding
    •ࣄલֶशࡁΈݴޠϞσϧΛContrastive LearningͰ
    fi
    ne-tuning͢ΔจຒΊࠐΈख๏

    •ਖ਼ྫͷ࡞ΓํͰ Unsupervised SimCSE / Supervised SimCSE ͷ2छྨʹ෼͔ΕΔ

    Unsupervised SimCSE (unsup-SimCSE)
    •ಉ͡จʹରͯ͠ҟͳΔdropout maskΛద༻ͯ͠࡞੒ͨ͠ೋͭͷจຒΊࠐΈΛਖ਼ྫͱ͢Δ

    • ಉ͡จΛ2ճಉ͡Ϟσϧʹ௨͚ͩ͢

    • “minimal” ͳ data augmentation ͱͯ͠ͷ dropout

    Supervised SimCSE (sup-SimCSE)
    •NLIσʔληοτͷؚҙ (entailment) ϖΞΛਖ਼ྫͱ͢Δ

    • ௚઀෼ྨ໰୊Λղ͘͜ͱ͸ͤͣɺਖ਼ྫΛ࡞ΔͨΊʹڭࢣϥϕϧΛؒ઀తʹར༻

    • hard negativeͱͯ͠ໃ६ (contradiction) ϖΞΛ༻͍ͯ͞Βʹੑೳ޲্

    •ҙຯతʹؚҙؔ܎ʹ͋Δจ͕จຒΊࠐΈ্ۭؒͰۙ͘ʹ෼෍͢ΔΑ͏ʹֶश
    21

    View Slide

  22. SimCSE: طଘݚڀͱͷൺֱ
    Unsupervised SimCSE
    •ϥϕϧͳ͠ςΩετͷΈΛ༻͍ͨରরֶशͰطଘͷڭࢣ͋ΓϕʔεϥΠϯͱಉ౳ੑೳ

    •ෳࡶͳdata augmentationɺ௥ՃͷωοτϫʔΫ͕ෆཁ

    •࣮૷͕ඇৗʹ༰қɺ֦ுੑʹ༏ΕΔ

    Supervised SimCSE
    •Contrastive Learningͱϥϕϧ෇σʔληοτΛ׆༻ͯ͠จຒΊࠐΈΛֶश͢Δख๏ΛఏҊ

    • ҙຯΛදݱ͢ΔจຒΊࠐΈҎ֎ʹ΋༷ʑʹԠ༻Մೳ

    •طଘͷڭࢣ͋ΓϕʔεϥΠϯΛେ্͖͘ճΔੑೳ

    •࣮૷͕ඇৗʹ༰қɺ֦ுੑʹ༏ΕΔ

    🧐ֶशޮ཰͕ඇৗʹߴ҆͘ఆ(
    fi
    ne-tuningͷૣ͍ஈ֊Ͱੑೳ͕େ͖͘޲্͢Δ)

    •Contrastive LearningʹڭࢣϥϕϧΛ׆༻͢Δํ๏͸طʹଘࡏ (Ҿ༻͞Ε͍ͯͳ͍͕) [52]
    22
    [52] Khosla+: Supervised Contrastive Learning, arXiv ‘20

    View Slide

  23. SimCSE: ܇࿅खॱ
    Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ
    Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ
    23
    ਤ͸౰֘࿦จΑΓҾ༻

    View Slide

  24. [53] Bowman+: A large annotated corpus for learning natural language inference, EMNLP ‘15

    [54] Williams+: A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference, NAACL ‘18
    SimCSE: Contradiction as hard negatives
    NLIσʔληοτ (SNLI [53], MNLI [54]) ࡞੒ͷࡍͷखॱˣ
    •1ͭͷpremise (લఏจ) ͕Ξϊςʔλʹఏࣔ͞ΕΔ

    •Ξϊςʔλ͕premiseʹରͯ͠entailment (ؚҙ), neutral (தཱ), contradiction (ໃ६) ؔ܎ʹ
    ͋Δจ (hypothesis; Ծઆจ) Λهड़

    → 1ͭͷpremiseʹରͯ͠entailment ͱ contradictionͷจ͕ͦΕͧΕଘࡏ
    •contradictionΛhard negativeͱͯ͠Ճ͑Δ͜ͱͰSTSͷੑೳ޲্
    24
    premise hypothesis label
    A man playing an electric guitar on stage. A man playing banjo on the
    fl
    oor. contradiction
    A man playing an electric guitar on stage. A man playing guitar on stage. entailment
    A man playing an electric guitar on stage. A man is performing for cash. neutral
    දͷྫ͸SNLI[53]͔Β

    View Slide

  25. SimCSE: ྨࣅ౓ߦྻͷܭࢉ
    25

    View Slide

  26. SimCSE: ͓ؾ࣋ͪ🧐
    26
    Unsupervised SimCSE:ʮਖ਼ଇԽ+จຒΊࠐΈಉ࢜Λ཭͢ʯ
    Supervised SimCSE: ʮҙຯతʹ͍ۙจຒΊࠐΈΛ͚ۙͮΔ+จຒΊࠐΈಉ࢜Λ཭͢ʯ

    View Slide

  27. Unsupervised SimCSE: pseudo code
    27
    ਤ͸https://carbon.now.sh/Ͱ࡞੒

    View Slide

  28. •ಉ͡Ϟσϧʹಉ͡ܥྻΛ2ճೖྗ

    •batch_size x batch_size ͷྨࣅ౓ߦྻΛ࡞੒

    •Թ౓ύϥϝʔλͰׂΔ

    • Թ౓ύϥϝʔλ͸0.05͘Β͍

    •ਖ਼ղ͸ྨࣅ౓ߦྻͷର֯੒෼

    •ଛࣦؔ਺͸CrossEntropy

    (softmaxͯ͠ਖ਼ղΛ࠷େԽ)

    Ͱܭࢉ(ָ)
    Unsupervised SimCSE: pseudo code
    28
    ਤ͸https://carbon.now.sh/Ͱ࡞੒

    View Slide

  29. Supervised SimCSE: pseudo code
    29
    ਤ͸https://carbon.now.sh/Ͱ࡞੒

    View Slide

  30. Supervised SimCSE: pseudo code
    30
    ਤ͸https://carbon.now.sh/Ͱ࡞੒
    •લఏจͱؚҙɾໃ६ؔ܎ͷԾઆจΛϞσϧʹೖྗ

    •લఏจͱؚҙͷԾઆจɺલఏจͱໃ६ͷԾઆจͰͦΕͧΕྨࣅ౓ߦྻΛ࡞੒

    •ԣʹ͚ͬͭ͘Δ

    •ਖ਼ղ͸ྨࣅ౓ߦྻͷର֯੒෼

    •ଛࣦؔ਺͸CrossEntropy

    (softmaxͯ͠ਖ਼ղΛ࠷େԽ)

    Ͱܭࢉ(ָ)

    View Slide

  31. ࣄલௐࠪ

    View Slide

  32. ࣄલௐࠪ
    •SimCSEͷ༗ޮੑɺSimCSE͕༗ޮͳ৔໘Λௐ΂ΔͨΊʹෳ਺ͷ࣮ݧΛ࣮ࢪ

    ௐ߲ࠪ໨: Unsupervised SimCSE
    •data augmentationख๏ͷҧ͍ʹΑΔੑೳ΁ͷӨڹ

    •Next Sentence PredictionͳͲଞͷڭࢣͳֶ͠श༻໨తؔ਺ͱͷൺֱ

    •dropout rateͷ୳ࡧͱੑೳ΁ͷӨڹ

    •ख๏͝ͱͷalignment, uniformityͷධՁ

    ௐ߲ࠪ໨: Supervised SimCSE
    •NLIΛؚΉෳ਺ͷσʔληοτͰͷֶशͱੑೳͷൺֱධՁ

    •hard negativesΛ௥Ճ͢Δ͜ͱʹΑΔੑೳ΁ͷӨڹ
    32

    View Slide

  33. Unsupervised SimCSE: data augmentationʹΑΔҧ͍
    •dropoutҎ֎ͷdata augmentationख๏ͱͷൺֱ

    •ֶशʹ࢖͏จ͸ӳޠWikipedia͔Β

    ϥϯμϜʹநग़ͨ͠100ສจ

    •STS Benchmark dev setͰධՁ࣮ݧ

    •Ͳͷ཭ࢄత(discrete)ͳdata augmentationख๏

    ΑΓ΋dropoutͷੑೳ͕ߴ͍

    •཭ࢄతͳdata augmentation͸ҙຯΛຒΊࠐΉ

    ϕΫτϧදݱͷֶशʹ͸ޮՌతͰͳ͍ʁ

    • ཭ࢄతͳઁಈͰ༰қʹจҙ͕มԽ͠͏Δ

    ͱ͍͏ࣗવݴޠจͷੑ࣭͕ؔ܎ʁ
    33

    View Slide

  34. Unsupervised SimCSE: ଞͷڭࢣͳ͠໨తؔ਺ͱͷൺֱ
    •Next sentence prediction (NSP) ͳͲͱUnsupervised SimCSEͱͷൺֱ

    • NSP͸BERTͷࣄલֶशʹར༻͞Ε͍ͯΔ

    •ଞͷख๏ͱൺ΂ͯ΋Unsupervised SimCSE͕STS Benchmark dev setͰ࠷΋ྑ͍ੑೳ

    🧐ࣄલֶशͷͨΊͷ໨తؔ਺ͱ

    จຒΊࠐΈͷͨΊͷ໨తؔ਺

    Λൺֱ͢Δҙਤ͸͋·Γ

    Θ͔Βͳ͍
    34

    View Slide

  35. Unsupervised SimCSE: dropout rateʹΑΔੑೳมԽ
    •dropout rate ΛมԽͤͯ͞STS Benchmark dev setʹ͓͚Δੑೳݕূ

    •ಛʹҎԼͷઃఆ͕ಛघ(extreme cases)ɺશ͘ಉ͡ຒΊࠐΈදݱ͕ܭࢉʹ࢖ΘΕΔ

    • : dropoutΛ͠ͳ͍

    • : dropoutΛ͢Δ͕2ճͷຒΊࠐΈͰಉ͡dropout maskΛ࢖͏

    •Transformer, BERTͷσϑΥϧτ஋Ͱ͋Δ 0.1 ͕࠷΋Α͍ੑೳ

    • dropout rateͷ஋ɺdropoutͷ࢓ํʹରͯ͠͸ͦΕͳΓʹහײ

    • , ͷ݁Ռ͔Β

    શ͘ಉ͡ຒΊࠐΈදݱͰͷ
    fi
    ne-tuning

    ͩͱੑೳ͕௿Լ͢Δ͜ͱ͕ࣔࠦ

    🧐ٯʹݴ͑͹ Ͱ΋ׂͱ

    ֶशͰ͖ͯΔ(ੑೳ޲্ͯ͠Δ)
    p
    p = 0.0
    p = Fixed 0.1
    p = 0.0 Fixed 0.1
    p = 0.0
    35

    View Slide

  36. Unsupervised SimCSE: alignment, uniformityͷධՁ
    •SimCSEΛ༻ֶ͍ͯशதͷalignment, uniformityͷ஋ͷมԽΛ؍࡯

    •alignment, uniformityͱ΋ʹ௿͍ํ͕ྑ͍

    •“No dropout”͓Αͼ “Fixed 0.1”͸

    uniformity͕ݮগ͢Δ͕alignment͕૿େ

    • ҙຯతʹۙͯ͘΋ɺͱʹ͔͘શͯͷ

    จຒΊࠐΈͷڑ཭Λ཭ͯ͠͠·͍ͬͯΔ

    •“Unsupervised SimCSE”͸uniformity͕

    ݮগͯ͠΋alignment͕૿େ(ѱԽ)͠ͳ͍

    🧐alignmentͱuniformity͚͔ͩΒ൑அ͢Δͱ

    “Delete one word”͕Ұ൪ྑͦ͞͏ʹ

    ݟ͑Δ͕…ʁ

    • STSͷੑೳͱ͸៉ྷʹ૬͍ؔͯ͠ΔΘ͚

    Ͱ͸ͳͦ͞͏ɺ͋͘·Ͱ໨҆ʁ
    36

    View Slide

  37. Supervised SimCSE: σʔληοτ͝ͱͷൺֱ
    •Supervised SimCSE͸NLIσʔληοτҎ֎Λ࢖༻͢Δͷ΋ߟ͑ΒΕΔ

    • hard negativesΛߟ͑ͳ͚Ε͹ɺจϖΞσʔληοτͳΒͳΜͰ΋͍͍

    •NLIσʔληοτ͕STSͰ࠷ྑͷ݁Ռ

    •NLIσʔληοτ͸ਓखͰ࡞੒͞Εͯ

    ͓Γߴ඼࣭

    • จϖΞͷޠኮͷॏෳ͕গͳ͘ͳΔ

    Α͏ʹઃܭ͞Ε͍ͯΔ
    37

    View Slide

  38. Supervised SimCSE: σʔληοτ͝ͱͷൺֱ
    •Supervised SimCSE͸NLIσʔληοτҎ֎Λ࢖༻͢Δͷ΋ߟ͑ΒΕΔ

    • hard negativesΛߟ͑ͳ͚Ε͹ɺจϖΞσʔληοτͳΒͳΜͰ΋͍͍

    •NLIσʔληοτ͕STSͰ࠷ྑͷ݁Ռ

    •NLIσʔληοτ͸ਓखͰ࡞੒͞Εͯ

    ͓Γߴ඼࣭

    • ͔ͭɺจϖΞͷޠኮͷॏෳ͕

    খ͘͞ͳΔΑ͏ઃܭ͞Ε͍ͯΔ
    38
    Quora Question Pairs
    ը૾Ωϟϓγϣϯ
    ٯ຋༁ʹΑΔݴ͍׵͑
    neutralϖΞΛਖ਼ྫʹ
    contradictionϖΞΛਖ਼ྫʹ
    ͢΂ͯͷϖΞΛਖ਼ྫʹ
    entailmentϖΞΛਖ਼ྫʹ

    View Slide

  39. ධՁ࣮ݧ

    View Slide

  40. ධՁ࣮ݧ
    •Unsupervised STS taskͰͷධՁ

    • STS12-16, STS Benchmark, SICK-R Λσʔληοτͱͯ͠ར༻ (ඪ४తͳઃఆ)

    •SentEvalΛ༻͍ͨධՁ (Appendix.E)

    • sentiment classi
    fi
    cationͳͲͷςΩετ෼ྨ໰୊ΛɺจຒΊࠐΈΛೖྗͱ͢Δ(ϩδε
    ςΟοΫճؼ)෼ྨثΛֶशͯ͠ੑೳΛଌΔ͜ͱͰɺจຒΊࠐΈͷੑ࣭ΛݟΔ
    40

    View Slide

  41. ධՁ࣮ݧ: STS
    •Unsupervised models ͕ڭࢣ

    ϥϕϧΛར༻͠ͳ͍ख๏

    • Supervised models ͕ڭࢣ

    ϥϕϧΛར༻ͨ͠ख๏

    • STSσʔληοτΛڭࢣ

    ͱͯ͠࢖͍ͬͯΔΘ͚Ͱ͸

    ͳ͍఺ʹ஫ҙ

    •Unsupervised / Supervised

    ͱ΋ʹSimCSE͕ߴ͍ੑೳ

    •Unsup-SimCSE-BERT_base͕

    SBERT_baseΑΓߴ͍ੑೳ

    • ڭࢣ͋ΓϕʔεϥΠϯΛ

    ڭࢣͳ͠ख๏Ͱӽ͑Δ

    •Sup-SimCSEͰSOTA
    41

    View Slide

  42. ධՁ࣮ݧ: STS
    •Unsupervised models ͕ڭࢣ

    ϥϕϧΛར༻͠ͳ͍ख๏

    • Supervised models ͕ڭࢣ

    ϥϕϧΛར༻ͨ͠ख๏

    • STSσʔληοτΛڭࢣ

    ͱͯ͠࢖͍ͬͯΔΘ͚Ͱ͸

    ͳ͍఺ʹ஫ҙ

    •Unsupervised / Supervised

    ͱ΋ʹSimCSE͕ߴ͍ੑೳ

    •Unsup-SimCSE-BERT_base͕

    SBERT_baseΑΓߴ͍ੑೳ

    • ڭࢣ͋ΓϕʔεϥΠϯΛ

    ڭࢣͳ͠ख๏Ͱӽ͑Δ

    •Sup-SimCSEͰSOTA
    42

    View Slide

  43. ධՁ࣮ݧ: SentEval
    🧐STSͱҟͳΓඇৗʹߴ͍ੑೳͱ

    ͍͏Θ͚Ͱ͸ͳ͍

    • ಛʹUnsup-SimCSE

    จຒΊࠐΈۭؒͷre
    fi
    ne͕ϝΠϯ

    ͔ͩΒʁ

    •SBERTͱSup-SimCSE-BERTͰ͸

    લऀͷํ͕ߴ͍͕ɺSRoBERTaͱ

    Sup-SimCSE-RoBERTaͰ͸ޙऀ

    ͷ΄͏͕ߴ͍ੑೳ

    •ԼྲྀλεΫʹ͓͚Δੑೳ͸

    Ϟσϧࣗମͷೳྗʹґଘ


    fi
    ne-tuningதʹMLMΛ͢Δ͜ͱͰ

    ԼྲྀλεΫͷੑೳ͕޲্

    • catastrophic forgetting Λ๷͙

    • STSͰͷੑೳ޲্͸ͳ͍
    43

    View Slide

  44. ධՁ࣮ݧ: SentEval
    🧐STSͱҟͳΓඇৗʹߴ͍ੑೳͱ

    ͍͏Θ͚Ͱ͸ͳ͍

    • ಛʹUnsup-SimCSE

    จຒΊࠐΈۭؒͷre
    fi
    ne͕ϝΠϯ

    ͔ͩΒʁ

    •SBERTͱSup-SimCSE-BERTͰ͸

    લऀͷํ͕ߴ͍͕ɺSRoBERTaͱ

    Sup-SimCSE-RoBERTaͰ͸ޙऀ

    ͷ΄͏͕ߴ͍ੑೳ

    •ԼྲྀλεΫʹ͓͚Δੑೳ͸

    Ϟσϧࣗମͷೳྗʹґଘ


    fi
    ne-tuningதʹMLMΛ͢Δ͜ͱͰ

    ԼྲྀλεΫͷੑೳ͕޲্

    • catastrophic forgetting Λ๷͙

    • STSͰͷੑೳ޲্͸ͳ͍
    44

    View Slide

  45. ධՁ࣮ݧ: SentEval
    🧐STSͱҟͳΓඇৗʹߴ͍ੑೳͱ

    ͍͏Θ͚Ͱ͸ͳ͍

    • ಛʹUnsup-SimCSE

    จຒΊࠐΈۭؒͷre
    fi
    ne͕ϝΠϯ

    ͔ͩΒʁ

    •SBERTͱSup-SimCSE-BERTͰ͸

    લऀͷํ͕ߴ͍͕ɺSRoBERTaͱ

    Sup-SimCSE-RoBERTaͰ͸ޙऀ

    ͷ΄͏͕ߴ͍ੑೳ

    •ԼྲྀλεΫʹ͓͚Δੑೳ͸

    Ϟσϧࣗମͷೳྗʹґଘ


    fi
    ne-tuningதʹMLMΛ͢Δ͜ͱͰ

    ԼྲྀλεΫͷੑೳ͕޲্

    • catastrophic forgetting Λ๷͙

    • STSͰͷੑೳ޲্͸ͳ͍
    45

    View Slide

  46. ධՁ࣮ݧ: ख๏͝ͱͷalignment, uniformityͷධՁ
    •BERT-
    fl
    ow΍BERT-whiteningͳͲ

    ͷpost-processingతͳख๏͸

    alignment͕͔ͳΓྑ͘ͳ͍

    • ҰํͰSBERT-
    fl
    owͳͲͷSTSੑೳ

    ͸͔ͳΓߴ͍

    •BERTͷຒΊࠐΈͷฏۉ͸

    uniformity͕ྑ͘ͳ͍

    • ҟํੑʹ͍ͭͯͷطଘݚڀͱҰக

    •Contrastive LearningΛ͍ͯ͠ͳ͍

    SBERT΋uniformity͕BERTΑΓվળ

    •Supervised / Unsupervised SimCSE

    ΋uniformity͕վળ
    46

    View Slide

  47. ධՁ࣮ݧ: ख๏͝ͱͷalignment, uniformityͷධՁ
    •BERT-
    fl
    ow΍BERT-whiteningͳͲ

    ͷpost-processingతͳख๏͸

    alignment͕͔ͳΓѱ͍(ߴ͍)

    • ʹ΋͔͔ΘΒͣɺSBERT-
    fl
    owͳͲ

    ͷSTSੑೳ͸͔ͳΓߴ͍

    •BERTͷຒΊࠐΈͷฏۉ͸

    uniformity͕ѱ͍

    • ҟํੑʹ͍ͭͯͷطଘݚڀͱҰக

    •Contrastive LearningΛ͍ͯ͠ͳ͍

    SBERT΋uniformity͕BERTΑΓվળ

    •Supervised / Unsupervised SimCSE

    ΋uniformity͕վળ
    47

    View Slide

  48. ධՁ࣮ݧ: ख๏͝ͱͷalignment, uniformityͷධՁ
    •BERT-
    fl
    ow΍BERT-whiteningͳͲ

    ͷpost-processingతͳख๏͸

    alignment͕͔ͳΓѱ͍(ߴ͍)

    • ʹ΋͔͔ΘΒͣɺSBERT-
    fl
    owͳͲ

    ͷSTSੑೳ͸͔ͳΓߴ͍

    •BERTͷຒΊࠐΈͷฏۉ͸

    uniformity͕ѱ͍

    • ҟํੑʹ͍ͭͯͷطଘݚڀͱҰக

    •Contrastive LearningΛ͍ͯ͠ͳ͍

    SBERT΋uniformity͕BERTΑΓվળ

    •Supervised / Unsupervised SimCSE

    ΋uniformity͕վળ
    48

    View Slide

  49. ධՁ࣮ݧ: ಛҟ஋ͷ෼෍
    •จຒΊࠐΈͷू߹Λಛҟ஋෼ղͯ͠ಛҟ஋ͷ෼෍Λ؍࡯

    •ಛҟ஋͕େ͖͍࣠͸େ͖ͳ৘ใྔΛ࣋ͭͱ(ϥϑʹ)ղऍͰ͖Δ

    • ภ͍ͬͯΔͱಛ௃্ۭؒͷ෦෼ۭؒʹจຒΊࠐΈ͕෼෍͕ͪ͠ɺͱ͍͏͜ͱ͕Θ͔Δ

    •SimCSE͸BERTΑΓ΋গ͠

    ෼෍͕ͳͩΒ͔ʹ

    •BERT-
    fl
    ow΍BERT-whitening

    ͸ΑΓ௚઀తʹຒΊࠐΈදݱ

    ͕౳ํత(isotropic)ʹ෼෍͢Δ

    Α͏ʹิਖ਼͢Δख๏ͳͷͰɺ

    ಛҟ஋ͷ෼෍΋ͳͩΒ͔
    49

    View Slide

  50. ධՁ࣮ݧ: จྨࣅ౓ͷ෼෍
    •ਓखධՁͷείΞͷ۠෼͝ͱʹจϖΞͷྨࣅ౓ͷ෼෍ΛՄࢹԽ

    •Supervised SimCSE-BERT_base͸ͷBERTͱൺֱͯ͠ྨࣅ౓ͷ෼෍͕޿͕͍ͬͯΔ

    •(S)BERT-whitening͸ਓखධՁ0-1ͷൣғͰྨࣅ౓0෇ۙʹࢁ͕ग़ݱɺ෼෍ܗঢ়͕มԽ
    50

    View Slide

  51. ධՁ࣮ݧ: ఆੑతͳධՁ / ྨࣅจݕࡧ
    •Flickr30k (15ສจ) Λ༻͍ͯྨࣅจݕࡧΛ࣮ࢪ

    •SBERTͱൺֱ࣭ͯ͠ͷߴ͍จ͕ݕࡧͰ͖͍ͯΔ (…΄Μͱʹ?)

    🧐ͪΐͬͱධՁʹࠔΔ
    51

    View Slide

  52. ·ͱΊ
    •Contrastive LearningΛ༻͍ͨඇৗʹγϯϓϧͳจຒΊࠐΈֶशख๏ΛఏҊ

    Unsupervised SimCSE
    •ೖྗจʹҟͳΔdropout maskΛֻ͚ͨจຒΊࠐΈಉ࢜Λਖ਼ྫʹ

    •STSλεΫʹ͓͍ͯڭࢣ͋ΓϕʔεϥΠϯΛ্ճΔ

    Supervised SimCSE
    •NLIσʔληοτΛར༻ؚͯ͠ҙؔ܎ͷจϖΞΛਖ਼ྫʹ

    •طଘख๏Λେ্͖͘ճͬͯSOTA

    •alignmentͱuniformtyΛվળͨ͜͠ͱͰࣄલֶशࡁΈݴޠϞσϧͷҟํੑΛղফͯ͠ߴ඼࣭
    ͳจຒΊࠐΈΛಘΒΕͨͱɺจຒΊࠐΈͷ෼෍ʹ͍ͭͯͷ࣮ݧ͔Βߟ࡯

    •γϯϓϧͳͷͰద༻ൣғ͕޿͍ɺԠ༻͕ޮ͘
    52

    View Slide

  53. ײ૝🧐
    •Unsupervised SimCSE͸ಉ͡Ϟσϧʹ2ճ௨͚ͩ͢ͱ͍͏γϯϓϧ͞Ͱૉ੖Β͍͠

    • Ͱ͖Δ޻෉͕͍Ζ͍Ζ͋ΔͷͰ೿ੜݚڀ͕ͨ͘͞Μੜ·Εͦ͏

    • ྫ: ڭࢣͳ͠ͰͲ͏΍ͬͯྑ͍hard negativesΛ࡞Δ͔ʁͳͲ

    •Unsupervised SimCSEͱSupervised SimCSE๊͕͖߹Θͤͳͷ͕ؾʹͳΔ

    • Supervised SimCSE͸SupCon΍SimCLRΛจຒΊࠐΈʹద༻͢Δ࿩ͳͷͰɺͦΕ୯ମ

    ͩͱcontribution͕খ͍͞ͱࢥΘΕΔͷΛආ͚ͨͷ͔΋͠Εͳ͍ʁ

    •alignmentͱuniformityΛվળ͢Δ͜ͱ͕ຊ౰ʹSTSλεΫʹͱͬͯ༗༻͔Ͳ͏͔ͷݕূ͕؁
    ͍ؾ͕͢Δ

    • STSͰ͸͋Μ·Γ૬͍ؔͯ͠ͳ͍

    • SentEvalͰ΋ͦΜͳʹ૬͍ؔͯ͠ͳͦ͞͏…? CVͱݴޠͰҟͳΔʁ

    • ޙൃͷPromptBERT[58]
    Ͱ୯ޠͷස౓৘ใ͕࠷΋ӨڹΛ༩͍͑ͯΔͱͷ෼ੳΞϦ
    53

    View Slide

  54. ෇࿥

    View Slide

  55. [55] Wu+: Smoothed Contrastive Learning for Unsupervised Sentence Embedding, arXiv Sep. ’21

    [56] Wu+: ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding, arXiv Sep. ’21

    [57] Zhang+: S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding, arXiv Nov. ’21
    SimCSEͷ೿ੜݚڀ
    GS-InfoNCE: ِͷෛྫ(ҙຯతʹ͸ಉ͡ͳͷʹෛྫ)ͷӨڹΛ௿ݮ͢ΔͨΊʹεϜʔδϯά [55]

    ESimCSE: ୯ޠෳ੡ͰจຒΊࠐΈͷ௕͞৘ใͷόΠΞε௿ݮ && Momentum EncoderͰෛྫ਺૿Ճ [56]
    S-SimCSE: dropout rateΛจͷຒΊࠐΈ͝ͱʹมԽͤ͞Δ [57]
    55
    චऀײ૝: Ӎޙͷ᝔

    View Slide