Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Language-agnostic BERT Sentence Embedding

[輪講資料] Language-agnostic BERT Sentence Embedding

多言語文埋め込み手法であるLanguage-agnostic BERT
 Sentence Embedding (LaBSE)の論文について解説した資料です。

Hayato Tsukagoshi

May 24, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Language-agnostic BERT

    Sentence Embedding
    M2, Graduate School of Informatics, Nagoya University, Japan
    ൃදऀ: ௩ӽॣ / Hayato TSUKAGOSHI
    Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang
    ACL 2022
    URL: https://arxiv.org/abs/2007.01852

    View Slide

  2. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ

    •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश

    • MLM + Translation Language Modeling → Additive Margin Softmax

    •छʑͷධՁ࣮ݧ

    • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ

    • ಛʹগࢿݯݴޠͰߴ͍ੑೳ

    • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍

    ࿦จબఆཧ༝
    •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋

    • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ
    LaBSE: Language-agnostic BERT Sentence Embedding
    2
    https://arxiv.org/abs/2007.01852

    View Slide

  3. ಋೖ

    View Slide

  4. •ࣗવݴޠจͷີϕΫτϧදݱ
    •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    4
    ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ
    ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ
    จຒΊࠐΈۭؒ
    [0.1, 0.2, ...]
    [0.1, 0.3, ...]
    [0.9, 0.8, ...]
    [0.5, 0.7, ...]

    View Slide

  5. •ࣗવݴޠจͷີϕΫτϧදݱ
    •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    5
    ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ
    ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ
    จຒΊࠐΈۭؒ
    [0.1, 0.2, ...]
    [0.1, 0.3, ...]
    [0.9, 0.8, ...]
    [0.5, 0.7, ...]

    View Slide

  6. •ࣗવݴޠจͷີϕΫτϧදݱ
    •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    6
    ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ
    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ
    ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ
    จຒΊࠐΈۭؒ
    [0.1, 0.2, ...]
    [0.1, 0.3, ...]
    [0.9, 0.8, ...]
    [0.5, 0.7, ...]
    ҙຯతʹྨࣅ
    ͍ۙҙຯΛ࣋ͭจ͸
    ۙ͘ʹ෼෍
    ϕΫτϧؒͷڑ཭͕

    ҙຯతͳؔ܎Λදݱ

    View Slide

  7. •ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ

    • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ

    • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢

    • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    7

    View Slide

  8. •ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ

    • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ

    • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢

    • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠

    ༗༻ੑɾԠ༻ઌ
    •ྨࣅจ(ॻ)ݕࡧɾΫϥελϦϯά

    •ܰྔͳจ(ॻ)෼ྨɾಛ௃நग़ (ଞλεΫͰԉ༻)

    •ີϕΫτϧݕࡧ (Dense Passage Retrieval)ʹΑΔ࣭໰Ԡ౴

    •຋༁ϝϞϦΛ༻͍ͨࣄྫϕʔε຋༁ɺࣄྫϕʔεػցֶश

    •Ԡ༻͸ඇৗʹ޿ൣ
    ಋೖ: จຒΊࠐΈ / Sentence embedding
    8

    View Slide

  9. BERTҎલ
    •୯ޠຒΊࠐΈͷ(ॏΈ෇͚)ฏۉͳͲͰจຒΊࠐΈΛ֫ಘ

    • p-mean [01], SWEM [02], DynaMax [03], SIF [04], uSIF [05], etc.

    •จຒΊࠐΈઐ༻ͷϞσϧΛߏங

    • Skip-Thought [06], SCDV [07], InferSent [08], USE [09], etc.
    [01] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18

    [02] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18

    [03] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR ’19

    [04] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17

    [05] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP ’18

    [06] Kiros+: Skip-Thought Vectors, NIPS ’15

    [07] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17

    [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17

    [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018
    ಋೖ: จຒΊࠐΈͷ୅දతͳख๏
    9

    View Slide

  10. Sentence-BERT [10]
    •ࣗવݴޠਪ࿦ (Natural Language Inference; NLI)

    ͰBERTΛ
    fi
    ne-tuning

    • ࣄલֶशࡁΈݴޠϞσϧ (Pre-trained 

    Language Model; PLM) Λ༻͍ͨ

    จຒΊࠐΈϞσϧͷ૲෼͚తݚڀ

    • BERTͰInferSentΛ΍Δ

    •౰࣌େ෯ʹSOTA (state-of-the-art) Λߋ৽

    • ޙड़͢ΔSimCSE͕΄΅্Ґޓ׵ʹͳͬͯ

    ͠·ͬͨͷͰɺࠓޙ͸͋·Γ࢖ΘΕͳͦ͞͏?
    [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP '19
    ಋೖ: จຒΊࠐΈͷ୅දతͳख๏
    10
    ਤ͸౰֘࿦จΑΓҾ༻

    View Slide

  11. SimCSE [11]
    •ରরֶश(Contrastive Learning)Λ༻͍ͯBERTΛ
    fi
    ne-tuning

    • Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ
    • Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ
    •Ϳͬͪ͗ΓͷSOTAɺ೿ੜݚڀ΋ଓʑ
    [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21
    ಋೖ: จຒΊࠐΈͷ୅දతͳख๏
    11
    ਤ͸౰֘࿦จΑΓҾ༻ɻҎલ࣮ࢪͨ͠SimCSEͷྠߨࢿྉ͸ͪ͜Β

    View Slide

  12. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ

    • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏)

    • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ

    •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍

    ධՁࢦඪ
    •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ
    •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13]

    •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15]

    •ΫϥελϦϯάɺςΩετݕࡧ
    [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18

    [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18

    [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18

    [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21
    ಋೖ: จຒΊࠐΈͷධՁ
    12

    View Slide

  13. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ

    • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏)

    • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ

    •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍

    ධՁࢦඪ
    •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ
    •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13]

    •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15]

    •ΫϥελϦϯάɺςΩετݕࡧ
    [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18

    [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18

    [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18

    [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21
    ಋೖ: จຒΊࠐΈͷධՁ
    13

    View Slide

  14. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ

    • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏)

    • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ

    •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍

    ධՁࢦඪ
    •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ
    •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13]

    •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15]

    •ΫϥελϦϯάɺςΩετݕࡧ
    [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18

    [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18

    [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18

    [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21
    ಋೖ: จຒΊࠐΈͷධՁ
    14
    STSͱSentEval͕࠷΋Α͘࢖ΘΕΔ

    View Slide

  15. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ

    ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ
    •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩
    ಋೖ: Semantic Textual Similarity (STS)
    15

    View Slide

  16. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ

    ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ
    •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩
    ಋೖ: Semantic Textual Similarity (STS)
    16
    ࣮ࡍͷSTSσʔληοτͷࣄྫ

    View Slide

  17. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ

    ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ
    •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩

    •ਓखධՁͱϞσϧ͕ܭࢉͨ͠ྨࣅ౓

    ͷ૬ؔ܎਺ͰධՁ

    • Pearsonͷ(ੵ཰)૬ؔ܎਺

    • SpearmanͷॱҐ૬ؔ܎਺
    •จຒΊࠐΈධՁͰ͸ڭࢣͳ͠ઃఆ
    • STSσʔλΛ༻ֶ͍ͨश͸͠ͳ͍

    • ࣄલʹ܇࿅͞ΕͨϞσϧΛධՁ
    ಋೖ: Semantic Textual Similarity (STS)
    17

    View Slide

  18. ڭࢣͳ͠STSͷධՁखॱ
    ᶃ จϖΞσʔληοτΛ༻ҙ

    ᶄ จຒΊࠐΈϞσϧΛ༻ҙ

    ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ
    ಋೖ: ڭࢣͳ͠ (Unsupervised) STS
    18
    จຒΊࠐΈϞσϧ
    จA จB



    View Slide

  19. ڭࢣͳ͠STSͷධՁखॱ
    ᶃ จϖΞσʔληοτΛ༻ҙ

    ᶄ จຒΊࠐΈϞσϧΛ༻ҙ

    ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ

    ᶆ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ

    • ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ

    ᶇ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ
    ಋೖ: ڭࢣͳ͠ (Unsupervised) STS
    19
    จຒΊࠐΈϞσϧ
    จA จB


    ਓखධՁ

    ʹΑΔྨࣅ౓

    Ϟσϧ

    ʹΑΔྨࣅ౓

    ૬ؔ܎਺

    View Slide

  20. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    20
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595

    View Slide

  21. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    21
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595

    View Slide

  22. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    22
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5

    View Slide

  23. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    23
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5
    จϖΞͷྨࣅ౓ॱҐ

    View Slide

  24. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    24
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5
    r1
    = 1 −
    6
    5(52 − 1)
    {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2}
    = 1 −
    6
    120
    (0 + 4 + 1 + 1 + 0)

    View Slide

  25. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    25
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5
    r1
    = 1 −
    6
    5(52 − 1)
    {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2}
    = 1 −
    6
    120
    (0 + 4 + 1 + 1 + 0) ਖ਼ղͷॱҐͱ༧ଌʹΑΔ

    ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)

    View Slide

  26. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    26
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5
    r1
    = 1 −
    6
    5(52 − 1)
    {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2}
    = 1 −
    6
    120
    (0 + 4 + 1 + 1 + 0)
    r1
    = 0.7 r2
    = 0.9
    ਖ਼ղͷॱҐͱ༧ଌʹΑΔ

    ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)

    View Slide

  27. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ
    27
    จA จB ਓؒධՁ Model 1 Model 2
    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978
    A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895
    A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977
    A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831
    A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
    1 1
    4 3
    2 2
    3 4
    5 5
    1
    2
    3
    4
    5
    r1
    = 1 −
    6
    5(52 − 1)
    {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2}
    = 1 −
    6
    120
    (0 + 4 + 1 + 1 + 0)
    r1
    = 0.7 r2
    = 0.9
    Model 2ͷ΄͏͕༏Ε͍ͯΔ

    View Slide

  28. ӳޠσʔληοτ
    •STS12, 13, 14, 15, 16 [16, 17, 18, 19, 20]

    •STS Benchmark (test set) [21]

    •SICK-R [22]

    ೔ຊޠσʔληοτ
    •JSICK [23]

    •JSTS [24]
    [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12

    [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13

    [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14

    [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15

    [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16

    [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17

    [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14

    [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021)

    [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022)
    ಋೖ: STSͷධՁ༻σʔληοτ
    28

    View Slide

  29. ӳޠσʔληοτ
    •STS12, 13, 14, 15, 16 [16, 17, 18, 19, 20]

    •STS Benchmark (test set) [21]

    •SICK-R [22]

    ೔ຊޠσʔληοτ
    •JSICK [23]

    •JSTS [24]
    [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12

    [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13

    [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14

    [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15

    [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16

    [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17

    [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14

    [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021)

    [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022)
    ಋೖ: STSͷධՁ༻σʔληοτ
    29
    STS12-16͸ͦΕͧΕখ͍͞σʔληοτͷू߹
    ௨ৗɺ“αϒ”σʔληοτΛࠞͥͯ૬ؔ܎਺Λܭࢉ
    STS12-16, STS Benchmark, SICK-RͷείΞͷ

    ฏۉͰ࠷ऴతͳධՁ͕͞ΕΔ͜ͱ͕ଟ͍

    View Slide

  30. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit

    •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ
    [25] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18
    ಋೖ: SentEval [25]
    30
    Task Type #train #test #class
    MR movie review 11,000 11,000 2
    CR product review 4,000 4,000 2
    SUBJ subjectivity status 10,000 10,000 2
    MPQA opinion-polarity 11,000 11,000 2
    SST-2 binary sentiment analysis 67,000 1,800 2
    TREC question-type classi
    fi
    cation 6,000 500 6
    MRPC paraphrase detection 4,100 1,700 2
    λεΫҰཡ

    View Slide

  31. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit

    •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ
    [25] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18
    ಋೖ: SentEval [25]
    31
    Task Type #train #test #class
    MR movie review 11,000 11,000 2
    CR product review 4,000 4,000 2
    SUBJ subjectivity status 10,000 10,000 2
    MPQA opinion-polarity 11,000 11,000 2
    SST-2 binary sentiment analysis 67,000 1,800 2
    TREC question-type classi
    fi
    cation 6,000 500 6
    MRPC paraphrase detection 4,100 1,700 2
    λεΫҰཡ

    View Slide

  32. SentEvalͷධՁखॱ
    ᶃ ύϥϝʔλΛݻఆͨ͠

    จຒΊࠐΈϞσϧΛ༻ҙ

    ᶄ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅

    ᶅ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ

    ͷ඼࣭ΛධՁ

    •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”ͱ͍͏Ծఆ

    •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍

    • i.e. จຒΊࠐΈͷ֤࣍ݩͷॏΈ෇͖࿨Ͱ෼ྨ

    •“ࣄલ܇࿅ࡁΈͷ”จຒΊࠐΈϞσϧͷੑೳΛධՁ
    ಋೖ: SentEval
    32
    จຒΊࠐΈϞσϧ



    ෼ྨث
    ෼ྨੑೳ͔Β

    จຒΊࠐΈͷ඼࣭ΛධՁ

    View Slide

  33. ఏҊख๏: LaBSE

    View Slide

  34. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ

    •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश

    • MLM + Translation Language Modeling → Additive Margin Softmax

    •छʑͷධՁ࣮ݧ

    • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ

    • ಛʹগࢿݯݴޠͰߴ͍ੑೳ

    • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍

    ࿦จબఆཧ༝
    •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋

    • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ
    LaBSE: Language-agnostic BERT Sentence Embedding
    34
    https://arxiv.org/abs/2007.01852

    View Slide

  35. •طଘݚڀͱͷൺֱ

    •LaBSEͷߏ੒ཁૉ

    • Dual-encoderΞʔΩςΫνϟ

    • Translation ranking task

    • MLM and TLM Pre-training

    •ଟݴޠจຒΊࠐΈͷؔ࿈ݚڀ

    •࣮ݧઃఆɾֶशͷ޻෉ɾධՁख๏

    •࣮ݧ݁Ռ

    •෼ੳ

    •෇Ճతͳ࣮ݧ
    ໨࣍
    35

    View Slide

  36. •ଟݴޠจຒΊࠐΈϞσϧΛ࡞ΔͨΊʹࣄલֶशˠ
    fi
    ne-tuning

    •গࢿݯݴޠͰߴ͍ੑೳ

    • ֶशσʔλʹແ͍ݴޠϖΞͰͷੑೳ΋ߴ͍

    •ݸʑͷख๏ɾςΫχοΫ͸΄ͱΜͲطଘݚڀͷ΋ͷ

    • ྑ͍ײ͡ͷ૊Έ߹Θͤ + େن໛σʔλ + େن໛ֶशͷͨΊͷ޻෉

    ͕͜ͷݚڀͷߩݙ
    LaBSE: طଘݚڀͱͷൺֱ
    36

    View Slide

  37. Dual-encoderΞʔΩςΫνϟ
    •܇࿅ํ๏ (Training strategy)ͷҰͭɺจຒΊࠐΈख๏ͰҰൠత

    •Sentence-BERT, SimCSEͳͲ΋͜ͷํࣜ

    Translation ranking task
    •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    •Additive margin softmaxΛ༻ֶ͍ͯशΛ޻෉

    MLM and TLM Pre-training
    •Masked Language Modeling (MLM)

    •Translation Language Modeling (TLM)
    LaBSEͷߏ੒ཁૉ
    37

    View Slide

  38. •2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒

    • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ)

    • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ
    LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ
    38
    Encoder
    Decoder
    Encoder-Decoder
    Encoder Encoder
    Dual-Encoder
    ଛࣦܭࢉ

    View Slide

  39. •2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒

    • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ)

    • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ
    LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ
    39
    Encoder
    Decoder
    Encoder-Decoder
    Encoder Encoder
    Dual-Encoder
    ଛࣦܭࢉ
    EncDecͷλεΫ
    • લޙจੜ੒
    • ຋༁จੜ੒
    • Denoising AE
    ॏΈڞ༗
    Dual-EncoderͷλεΫ
    • ؚҙؔ܎ೝࣝ
    • ରরֶश

    View Slide

  40. •GuoΒ[26] ͕ఏҊ

    •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning)

    •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ
    [26] Guo+: E
    ff
    ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    40
    զഐ͸ೣͰ͋Δɻ
    Ja
    I am a cat.
    En
    Nice to meet you.
    En
    ਖ਼ྫ
    ෛྫ

    View Slide

  41. •GuoΒ[26] ͕ఏҊ

    •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning)

    •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ
    [26] Guo+: E
    ff
    ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    41
    զഐ͸ೣͰ͋Δɻ
    Ja
    I am a cat.
    En
    Nice to meet you.
    En
    ͚ۙͮΔ
    ԕ͚͟Δ
    ਖ਼ྫ
    ෛྫ

    View Slide

  42. •GuoΒ[26] ͕ఏҊ

    •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning)

    •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ
    [26] Guo+: E
    ff
    ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    42
    զഐ͸ೣͰ͋Δɻ
    Ja
    I am a cat.
    En
    Nice to meet you.
    En
    ͚ۙͮΔ
    ԕ͚͟Δ
    ਖ਼ྫ
    ෛྫ
    ྨࣅ౓࠷େԽ
    ྨࣅ౓࠷খԽ

    View Slide

  43. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ

    •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ

    ͍ͭͯྨࣅ౓Λܭࢉ

    • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    43
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    ਖ਼ྫ
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  44. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ

    •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ

    ͍ͭͯྨࣅ౓Λܭࢉ

    • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ

    •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ

    • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    44
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    ਖ਼ྫ
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ
    ྨࣅ౓͸0.98…
    0.24…

    View Slide

  45. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ

    •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ

    ͍ͭͯྨࣅ౓Λܭࢉ

    • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ

    •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ

    • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ

    • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    45
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    ਖ਼ྫ
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ
    0.24…
    ྨࣅ౓͸0.98…

    View Slide

  46. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ

    •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ

    ͍ͭͯྨࣅ౓Λܭࢉ

    • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ

    •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ

    • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ

    • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ

    •ଛࣦؔ਺͸ˠ

    • ͸ຒΊࠐΈͷ಺ੵ
    ϕ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    46
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    ਖ਼ྫ
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ
    0.24…
    ྨࣅ౓͸0.98…

    View Slide

  47. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ

    • in-batch negativesͱݺ͹ΕΔ

    • ྨࣅ౓ߦྻ͸

    (batch_size x batch_size)

    ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ)
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    47
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    ਖ਼ྫ
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  48. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ

    • in-batch negativesͱݺ͹ΕΔ

    • ྨࣅ౓ߦྻ͸

    (batch_size x batch_size)

    ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ)
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    48
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    batch_size
    ਖ਼ྫ
    batch_size
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  49. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ

    • in-batch negativesͱݺ͹ΕΔ

    • ྨࣅ౓ߦྻ͸

    (batch_size x batch_size)

    ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ)

    •softmaxͰଛࣦ͕ඇରশੑʹ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    49
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    batch_size
    ਖ਼ྫ
    batch_size
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  50. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ

    • in-batch negativesͱݺ͹ΕΔ

    • ྨࣅ౓ߦྻ͸

    (batch_size x batch_size)

    ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ)

    •softmaxͰଛࣦ͕ඇରশੑʹ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    50
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    batch_size
    ਖ਼ྫ
    batch_size
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  51. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ

    • in-batch negativesͱݺ͹ΕΔ

    • ྨࣅ౓ߦྻ͸

    (batch_size x batch_size)

    ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ)

    •softmaxͰଛࣦ͕ඇରশੑʹ

    • ղফ͢ΔͨΊɺ2ํ޲(→↓)ͷଛࣦΛ଍͠߹ΘͤΔ
    LaBSEͷߏ੒ཁૉ: Translation ranking task
    51
    ࢲ͸ϖϯͰ͢ɻ
    I am
    a pen.
    I’m
    a cat.
    Nice to
    m
    eet you.
    Sentence em
    bedding
    I’m
    a perfect hum
    an.
    batch_size
    ਖ਼ྫ
    batch_size
    զഐ͸ೣͰ͋Δɻ
    ͸͡Ί·ͯ͠ɻ
    จຒΊࠐΈ
    ࢲ͸׬ᘳͳਓؒͰ͢ɻ

    View Slide

  52. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑

    • margin͸ਖ਼ྫʹ͚ͩద༻

    •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ
    ϕ ϕ′

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19
    LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27]
    52

    View Slide

  53. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑

    • margin͸ਖ਼ྫʹ͚ͩద༻

    •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ

    •มߋޙͷଛࣦؔ਺͸͜͏↓
    ϕ ϕ′

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19
    LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27]
    53

    View Slide

  54. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑

    • margin͸ਖ਼ྫʹ͚ͩద༻

    •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ

    •มߋޙͷଛࣦؔ਺͸͜͏↓
    ϕ ϕ′

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19
    LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27]
    54

    View Slide

  55. Masked Language Modeling (MLM)
    •ϚεΫ͞Εͨ෦෼ʹ౰ͯ͸·Δ୯ޠΛ༧ଌ͢Δ͜ͱͰϞσϧΛ܇࿅

    • ࣗݾڭࢣ͋ΓֶशͰ൚༻తͳݴޠ஌ࣝΛ֫ಘ

    •BERT [00]Ͱ͓ೃછΈ
    LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training
    55
    զഐ ͸ [MASK] Ͱ͋Δ ɻ
    [CLS] [SEP]
    BERT

    View Slide

  56. Translation Language Modeling (TLM) [28]
    •຋༁จϖΞΛ࿈݁ͯ͠MLM

    • ೋݴޠؒͷରԠ (alignment) ͷֶशΛظ଴
    [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19
    LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training
    56
    զഐ ͸ [MASK] Ͱ͋Δ ɻ
    [/s] [/s]
    Transformer

    [MASK] am a [/s]
    I
    cat

    View Slide

  57. •TLM͸MLMͷ֦ு

    • ଟগͷมߋͰMLMͱಉ͡Α͏ʹֶश͕Ͱ͖Δ

    •LaBSEͰ͸MLMͱTLM

    Λ૊Έ߹Θֶͤͯश
    [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19
    LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training
    57
    ͜ͷݚڀͰ͸ଟݴޠೳྗ
    (multilinguality)Λ޲্ͤ͞ΔͨΊ

    Language embeddings

    Λ࢖͍ͬͯͳ͍

    View Slide

  58. •BiLSTMΛ༻͍ͨseq2seqͷ຋༁λεΫΛղ͖จຒΊࠐΈϞσϧΛ֫ಘ

    • Encoder͕ݴޠඇґଘͷจදݱΛநग़

    • DecoderʹจຒΊࠐΈͱݴޠIDΛิॿతʹೖྗͯ͠จੜ੒

    •ݴޠԣஅNLI (XNLI)΍ଟݴޠݕࡧλεΫͰߴ͍ੑೳɺ97ݴޠʹରԠ
    [29] Artetxe+: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, TACL ‘18
    ؔ࿈ݚڀ: LASER [29]
    58

    View Slide

  59. •ϚϧνλεΫֶशʹΑͬͯߴੑೳͳଟݴޠจຒΊࠐΈϞσϧΛ֫ಘ

    • Translation ranking taskֶ͕शλεΫʹؚ·Ε͍ͯΔ

    • ࣗݾڭࢣ͋ΓֶशͳͲʹΑΔࣄલֶश͸(ڪΒ͘)͍ͯ͠ͳ͍
    [30] Yang+: Multilingual Universal Sentence Encoder for Semantic Retrieval, ACL: System Demonstrations ‘20
    ؔ࿈ݚڀ: m-USE [30]
    59

    View Slide

  60. Monolingual Data
    •CommonCrawl, Wikipedia͔Βऩूɺ17B (170ԯ) sentences
    •લॲཧࡁΈɺࣄલֶश(MLM)ʹͷΈར༻

    Bilingual Translation Pairs
    •Webϖʔδͷ຋༁จϚΠχϯά(bitext mining)Ͱσʔλऩूɺ6B (60ԯ) pairs
    •σʔλෆۉߧରࡦͱͯ͠ɺ֤ݴޠͷจ਺͕100MҎԼʹͳΔΑ͏੍ݶ

    •αϒηοτ΁ͷਓखධՁΛ༻͍ͨ௿඼࣭σʔλͷϑΟϧλϦϯά

    •ࣄલֶश(MLM & TLM)ͱdual-encoderͷ܇࿅ʹར༻
    ֶशσʔλ
    60

    View Slide

  61. •ରরֶशϕʔεͷख๏͸Ұൠʹෛྫͷ਺͕ଟ͍΄Ͳੑೳ͕ߴ͘ͳΔ

    • ࠓճ͸in-batch negativesΛ༻͍ΔͷͰbatch_size - 1͕ෛྫͷ਺

    • batch sizeΛେ͖͘͢Δͱͦͷ෼େྔͷϝϞϦΛ৯͏

    •Accelerator͝ͱ(͜ͷݚڀͰ͸TPU)ʹ෼ׂͯ͠ྨࣅ౓ܭࢉ
    ֶशͷ޻෉
    61

    View Slide

  62. United Nations (UN)
    •ӳޠ͔ΒରԠ͢ΔผݴޠͷจॻΛݕࡧ (Precision@1ʹaccuracy)
    •en-fr, en-es, en-ru, en-ar, en-zhͷ5ݴޠରɺ86,000จ

    Tatoeba
    •ӳޠҎ֎ͷݴޠ͔ΒରԠ͢Δӳ༁Λݕࡧ (Average accuracy)
    •https://tatoeba.org ͔ΒྫจͱͦΕʹඥͮ͘ର༁Λऩूͨ͠ίʔύε

    •112ݴޠɺ֤ݴޠʹ͖ͭ1000จͱରԠ͢Δӳ༁͕ଘࡏ

    •طଘݚڀʹ฿͍ɺ36ݴޠͷΈʹߜͬͨαϒηοτͰͷධՁ΋࣮ࢪ

    BUCC
    •୯ݴޠίʔύε͔Β຋༁จϖΞΛݟ͚ͭΔ (Precision, Recall, F1)
    •fr-en, de-en, ru-en, zh-enͷ4ݴޠର
    ධՁλεΫ: bitext retrieval
    62

    View Slide

  63. SentEval
    •ԼྲྀλεΫʹ͓͚Δ෼ྨੑೳΛධՁ

    •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏

    Semantic Textual Similarity (STS)
    •ϞσϧʹΑΔྨࣅ౓ͱਓखධՁʹΑΔྨࣅ౓ͱͷ૬ؔΛධՁ

    •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏

    •ࣄޙ෼ੳͷઅͰ঺հ
    ධՁλεΫ: จຒΊࠐΈͷ඼࣭ධՁ
    63

    View Slide

  64. •ޠኮαΠζ

    • mBERT Vocab: multilingual BERT (mBERT)ͱಉ͡(119,547)

    • Customized Vocab: ݴޠ͝ͱͷσʔλෆۉߧରࡦΛͯ͠ॳΊ͔Β࡞੒
    (501,153)
    •ࣄલֶश (PT)

    • MLM+TLMʹΑΔࣄલֶशΛ΍Δ͔Ͳ͏͔

    • ΍Βͳ͍৔߹͸Translation ranking taskͷΈΛߦ͏

    •Additive Margin Softmax (AMS)

    • Translation ranking taskʹmarginΛ࢖͏͔Ͳ͏͔
    ࣮ݧ৚݅
    64

    View Slide

  65. •จຒΊࠐΈʹ͸ [CLS] ΛL2ਖ਼نԽͯ͠ར༻

    •optimizer = AdamW, learning rate = 1e-3, seq length = 128

    Pre-training
    •batch size: 8192

    Translation ranking task
    •batch size: 4096

    •w/ Pre-training: 50k steps, w/o Pre-training: 500k steps

    •margin value: 0.3
    ࣮ݧઃఆ
    65

    View Slide

  66. •LaBSE (Customized Vocab + AMS + PT) ʹSOTAߋ৽

    • Yang et al. ͸bilingual modelͰ͋Γɺ֤ݴޠ͝ͱʹϞσϧ͕ඞཁ

    • LaBSE͸ҰͭͷϞσϧͰ109Ҏ্ͷݴޠʹରԠՄೳ
    ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba
    66
    ͳ͔ͥBase w/ Customized Vocabͷ݁Ռ͕ଘࡏ͠ͳ͍

    View Slide

  67. •UNʹ͓͍ͯPre-training (PT)͸྆ํͷޠኮͰ༗ޮ

    • Translation ranking taskʹΑΔֶश΋଎͘ͳΔ
    ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba
    67

    View Slide

  68. •Tatoebaʹ͓͍ͯPre-training (PT)͸mBERT VocabͰ͸༗ޮͰͳ͍

    • mBERT VocabͰ͸[UNK]ʹஔ͖׵ΘΔtoken͕ଟ͘ͳΔͷ͕ݪҼ

    • ྫ͑͹͋ΔݴޠͰ͸71%͕[UNK]ʹͳΔ
    ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba
    68

    View Slide

  69. •UNͱTatoebaͰੑೳͷ܏޲͕ҟͳΔ

    • UNͷํ͕େن໛ͳbitext retrieval

    • Ϟσϧؒͷҧ͍Λৄࡉʹݕग़͢Δʹ͸େن໛ͳϕϯνϚʔΫ͕ඞཁ
    ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba
    69

    View Slide

  70. •طଘݚڀͱൺֱͯ͠LaBSE͕΄ͱΜͲͰ࠷ߴੑೳ

    •ҰͭͷϞσϧͰશͯͷݴޠରʹରԠՄೳ

    • Yang et al.͸4ͭͷϞσϧ͕ඞཁ
    ࣮ݧ݁Ռ: BUCC
    70

    View Slide

  71. •ӳޠ୯ݴޠʹ͓͚ΔԼྲྀλεΫ΁ͷసҠੑೳ͸ͦ͜·Ͱߴ͘ͳ͍

    • ӳޠจຒΊࠐΈϞσϧͱಉ͡ఔ౓ʹ͸ߴ͍

    • ଟݴޠϞσϧͷੑೳͱͯ͠͸े෼͔
    ࣮ݧ݁Ռ: SentEval
    71

    View Slide

  72. •maringͷ஋ʹΑΔUNͰͷ

    ੑೳมԽΛ؍࡯

    • margin͕ੑೳ޲্ʹد༩

    •0.3ఔ౓ͰੑೳมԽ͕ऩଋ

    • ͦ͜·Ͱ͸consistentʹ޲্

    • શͯͷϞσϧͰੑೳ͕޲্
    ෼ੳ: Additive Margin Softmax
    72
    marginΛ0.4ΑΓେ͖͍ͯͬͨ࣌͘͠ͷมԽ͕ऩଋͨ͠··ͳͷ͔ؾʹͳΔ͕…

    View Slide

  73. •PTʹΑͬͯશମతͳੑೳ޲্

    •PT͋ΓͷϞσϧ͸50K stepsͷ

    ܇࿅Ͱطʹੑೳ͕ऩଋ

    • 50K steps = 200M examples

    • ର༁σʔλ͕গͳ͍͍ͯ͘

    •PT͸ੑೳ޲্ͱऩଋ଎౓޲্

    (=܇࿅ࣄྫ਺࡟ݮ)ʹ໾ཱͭ
    ෼ੳ: ࣄલֶशͷ༗༻ੑ
    73

    View Slide

  74. •Tatoebaͷগࢿݯݴޠʹ

    ର͢ΔੑೳΛ෼ੳ

    • LaBSE͸গσʔλͳݴޠ

    ͕ࠞͬͯ͟΋ੑೳ͕ߴ͍

    •en-xxͳݴޠରͷσʔλ͕

    ଘࡏ͠ͳ͍ݴޠʹ͍ͭͯͷ

    TatoebaͰͷੑೳΛ෼ੳ

    • 3ׂ͕75%௒͑ɺ݁ߏߴ͍

    • ݴޠಉ࢜ͷྨࣅ౓ͷଞʹɺ

    େن໛σʔλͰͷ܇࿅ͷ

    ޮՌͩͱߟ͑ΒΕΔ
    ෼ੳ: গࢿݯݴޠʹର͢Δੑೳ
    74

    View Slide

  75. •LaBSEͷӳޠSTSʹ͓͚Δ

    ੑೳΛ෼ੳ

    •NLIͰֶश͍ͯ͠Δm-USE

    ͕ඇৗʹߴ͍ੑೳ

    • SBERTΑΓ΋ߴ͍ʁ

    •LaBSE͸௿Ίͷ਺஋

    •຋༁จϖΞʹΑΔֶश͸

    • ҙຯͷ౳Ձੑͷݕग़ʹ͸༏ΕΔ͕

    • ҙຯ͕ͲΕ͘Β͍ҟͳ͍ͬͯΔ͔Λࡉ͔۠͘ผ͢Δ͜ͱ͸Ͱ͖ͳ͍
    ෼ੳ: STS
    75

    View Slide

  76. •CommonCrawl͔Βର༁σʔλΛऩू͢Δ࣮ݧΛ࣮ࢪ

    •ऩूͨ͠ର༁ίʔύεͰػց຋༁ϞσϧΛ܇࿅ɺੑೳධՁ

    • ଟݴޠจຒΊࠐΈϞσϧͷԠ༻ɾ༗༻ੑධՁ

    •ӳޠɺதࠃޠɺυΠπޠͷCommonCrawlΛલॲཧ

    • ·ͣશͯͷจΛຒΊࠐΈදݱʹ

    • ͢΂ͯͷඇӳޠͷจʹରͯ͠ɺຒΊࠐΈʹ͓͚Δ࠷ۙ๣ΛରԠ͚ͮ

    • ྨࣅ౓͕0.6ະຬͷ΋ͷ͸আڈ

    •WMTͷϕϯνϚʔΫσʔληοτΛ༻͍ͯɺBLEUͰ຋༁ੑೳධՁ
    ෇Ճతͳ࣮ݧ: Mining Parallel Text from CommonCrawl
    76

    View Slide

  77. •ਓख࡞੒͞Εͨର༁σʔλΛ༻͍ͨγεςϜͱൺֱ

    • en-deͷNewsʹ͓͍ͯɺطଘݚڀ͔Β2.8ϙΠϯτͷԼམͷΈ

    • en-zhͷNewsʹ͓͍ͯ͸΄΅ಉ౳ͷੑೳ

    •TEDʹ͓͍ͯ΋طଘγεςϜͱಉ౳ఔ౓ͷੑೳ

    • ػցతʹ࡞੒͞Εͨσʔλͷ܇࿅Ͱಉ౳ੑೳͰ͍͢͝
    ෇Ճతͳ࣮ݧ: Mining Parallel Text from CommonCrawl
    77

    View Slide

  78. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ

    •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश

    • MLM + Translation Language Modeling → Additive Margin Softmax

    •छʑͷධՁ࣮ݧ

    • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ

    • ಛʹগࢿݯݴޠͰߴ͍ੑೳ

    • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍

    • ࣄલֶशʹΑͬͯର༁σʔλྔΛ࡟ݮ

    • AMS͕ੑೳʹେ͖ͳӨڹ

    •ࣄલֶशࡁΈϞσϧ͕ެ։͞Ε͍ͯΔ
    ·ͱΊ: Language-agnostic BERT Sentence Embedding
    78
    https://arxiv.org/abs/2007.01852

    View Slide

  79. ෇࿥

    View Slide

  80. •(ॳग़2020೥ͷݚڀ͚ͩͲ)ͪΐͬͱݹ͍

    • ൺֱର৅ͱͯ͠SimCSE΍DistilCSE͕࢖ΘΕ͍ͯͳ͍

    •Typo͕ͪΐͬͱଟ͍ʢ ; ; ʣ

    •ରরֶशΛ༻͍ͨจຒΊࠐΈख๏ͷఆੴ͕࢖ΘΕ͍ͯͳ͍

    • จྨࣅ౓ͱͯ͠಺ੵΛ࢖͍ͬͯΔ

    • Թ౓ύϥϝʔλΛ࢖͍ͬͯͳ͍

    • ͱࢥͬͨΒL2 normalization→scaling͍ͯͨ͠ͷͰຊ࣭తʹಉͩͬͨ͡
    ͪΐͬͱؾʹͳΔ఺
    80

    View Slide

  81. •࠷ۙ͸PearsonͰ͸ͳ͘SpearmanͰධՁ͢Δ͜ͱ͕ଟ͍

    • Pearson͸͋Μ·ΓධՁࢦඪͱͯ͠ྑ͘ͳ͍ΑͶͱ͍͏࿩͕͋Δ [31]

    •SpearmanͰධՁ͢ΔݶΓSTS͸ʮॱҐ෇͚λεΫʯͳ͜ͱʹ஫ҙ

    •࠷ۙSTS Benchmarkͷdev setͰϋΠύϥௐ੔͢Δͷ͕ྲྀߦ͍ͬͯΔ

    • 250 step͝ͱʹdevͰධՁͯ͠࠷ߴͷcheckpointͰtestͰධՁ (SimCSE)

    • STSʹաద߹ͦ͠͏ͳͷͰ͋·Γྑ͍ํ਑ʹ͸ࢥ͑ͳ͍͕…

    • ʮֶशʹ͸࢖͑ͳ͍͚Ͳdevͱͯ͠͸࢖͑·͢ʂʯ͸ྑ͍ઃఆʁ

    •STSλεΫ͸ධՁख๏͕࿦จ͝ͱʹҟͳΔ͕࣌͋Γɺ஫ҙ͕ඞཁ

    • ධՁࢦඪ΍ධՁखॱ͕όϥ͍͍ͭͯΔ͜ͱ͕͋ͬͨ (࠷ۙ͸౷Ұ͞ΕͯΔ)

    • SimCSE࿦จ [11] ͷAppendix.Bʹهड़͕͋ΔͷͰҰಡΛਪ঑
    [31] Reimers+: Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity, COLING ‘16
    ಋೖ: STSʹ·ͭΘΔখൌ
    81

    View Slide

  82. •࣮͸จͷҙຯҎ֎ΛຒΊࠐΜͰ΋͍͍
    • ࢖༻୯ޠɺελΠϧ(ϑΥʔϚϧ͞ɺܟޠ)ɺ࣭໰ͱ౴͑ͷۙ͞ɺetc.

    •จຒΊࠐΈۭؒ͸จͷԿΛ͚ۙͮΔ͔Ͱಛ௃͚ͮΒΕΔ

    • ͲͷΑ͏ʹڑ཭Λఆٛ͢Δ͔͕จຒΊࠐΈͷੑ࣭ΛܾΊΔ

    •܇࿅Ͱ͚ۙͮΔจͱͦΕʹΑͬͯදݱ͞ΕΔ“ڑ཭”ͷରԠ (චऀͷ༧ଌ)

    • ؚҙؔ܎ʹ͋Δจ: จͷද૚తྨࣅ౓ΑΓҙຯʹ஫໨

    • ࣭໰ͱճ౴: จࣗମͷҙຯΑΓ࣭໰ͱճ౴͕ද͢಺༰ʹ஫໨

    • ຋༁จϖΞ: Ͳͷݴޠ͔Λແࢹͯ͠จͷҙຯʹ஫໨
    ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ)
    82

    View Slide

  83. [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP ’17

    [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP ’19

    [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21

    [32] Hill+: Learning Distributed Representations of Sentences from Unlabelled Data, NAACL ’16

    [33] Wang+:, TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, EMNLP
    fi
    ndings ’21

    [34] Li+: OPTIMUS: Organizing Sentences via Pre-trained Modeling of a Latent Space, EMNLP ’20
    ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ)
    83
    จͷҙຯΛຒΊࠐΉख๏

    •InferSent: dual-encoder (Siamese) ߏ଄ͰLSTMΛNLI෼ྨͰֶश [08]

    •Sentence-BERT: dual-encoderߏ଄ͰBERTΛNLI෼ྨͰ
    fi
    ne-tuning [10]

    •Supervised SimCSE: NLIͷؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश[11]

    จΛ࠶ߏஙͰ͖ΔΑ͏ʹจͷ৘ใΛຒΊࠐΉख๏

    •SDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠LSTMΛֶश [32]

    •TSDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠TransformerΛֶश [33]

    •Optimus: Ͱ͔͍VAE [34]

    View Slide

  84. [06] Kiros+: Skip-Thought Vectors, NIPS ’15

    [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018

    [35] Tsukagoshi+: DefSent: Sentence Embeddings using De
    fi
    nition Sentences, ACL ’21

    [36] Wu+: DistilCSE: E
    ff
    ective Knowledge Distillation For Contrastive Sentence Embeddings, ARR ’22

    [37] Wu+: DisCo: E
    ff
    ective Knowledge Distillation For Contrastive Learning of Sentence Embeddings, arXiv ’21 ([35]ͱಉ಺༰)
    ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ)
    84
    લޙͷจͷ৘ใΛຒΊࠐΉख๏

    •Skip-Thought: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [06]

    •USE: Skip-Thoughtͷڭࢣͳֶ͠श + ෼ྨ໰୊ʹΑΔڭࢣ͋Γֶश [09]

    ఆٛจ͔Β୯ޠͷҙຯΛߏ੒͢ΔΑ͏ʹจͷҙຯΛຒΊࠐΉख๏

    •DefSent: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [35]

    Α͘Θ͔Βͳ͍ख๏

    •Unsupervised SimCSE: ҟͳΔdropoutΛద༻ͨ͠จΛਖ਼ྫʹରরֶश [11]

    •DistilCSE: ڭࢣͱੜెͷจຒΊࠐΈΛਖ਼ྫͱͨ͠ରরֶशʹΑΔৠཹ [36, 37]

    View Slide

  85. •୯ݴޠจຒΊࠐΈϞσϧ͔Βͷ஌ࣝৠཹͰଟݴޠจຒΊࠐΈϞσϧΛ܇࿅

    • ຋༁ίʔύεΛ࢖ͬͯҟͳΔݴޠͷจຒΊࠐΈΛ௚઀͚ۙͮΔ

    •ҟݴޠؒSTSͰΑ͍ੑೳ

    •୯ݴޠSTSͰ΋ੑೳ͕ߴ͍

    • NLIΛ࢖͍ͬͯΔͷͰ౰ͨΓલʁ

    •ݴޠόΠΞε͕LaBSEΑΓখ͍͞

    •Teacher modelͷจຒΊࠐΈۭؒͱࣅͨߏ଄Λ࣋ͭଟݴޠจຒΊࠐΈϞσ
    ϧΛ࡞ΕΔͷ͕ྑ͍ͱ͜Ζ

    • ҰͭͷϞσϧͰsequentialʹ΍Δͱۭ͕ؒյΕΔ
    [38] Reimers+: Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, EMNLP ‘20
    ؔ܎͕͋Δ࿦จ:

    Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation [38]
    85

    View Slide