Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Language-agnostic BERT Sentence Embedding

[輪講資料] Language-agnostic BERT Sentence Embedding

多言語文埋め込み手法であるLanguage-agnostic BERT
 Sentence Embedding (LaBSE)の論文について解説した資料です。

Hayato Tsukagoshi

May 24, 2022
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Language-agnostic BERT
 Sentence Embedding M2, Graduate School of Informatics, Nagoya

    University, Japan ൃදऀ: ௩ӽॣ / Hayato TSUKAGOSHI Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang ACL 2022 URL: https://arxiv.org/abs/2007.01852
  2. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 2 https://arxiv.org/abs/2007.01852
  3. ಋೖ

  4. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 4 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]
  5. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 5 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]
  6. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 6 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ

    ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...] ҙຯతʹྨࣅ ͍ۙҙຯΛ࣋ͭจ͸ ۙ͘ʹ෼෍ ϕΫτϧؒͷڑ཭͕
 ҙຯతͳؔ܎Λදݱ
  7. •ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢ • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠ ಋೖ:

    จຒΊࠐΈ / Sentence embedding 7
  8. •ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢ • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠ ༗༻ੑɾԠ༻ઌ

    •ྨࣅจ(ॻ)ݕࡧɾΫϥελϦϯά •ܰྔͳจ(ॻ)෼ྨɾಛ௃நग़ (ଞλεΫͰԉ༻) •ີϕΫτϧݕࡧ (Dense Passage Retrieval)ʹΑΔ࣭໰Ԡ౴ •຋༁ϝϞϦΛ༻͍ͨࣄྫϕʔε຋༁ɺࣄྫϕʔεػցֶश •Ԡ༻͸ඇৗʹ޿ൣ ಋೖ: จຒΊࠐΈ / Sentence embedding 8
  9. BERTҎલ •୯ޠຒΊࠐΈͷ(ॏΈ෇͚)ฏۉͳͲͰจຒΊࠐΈΛ֫ಘ • p-mean [01], SWEM [02], DynaMax [03], SIF

    [04], uSIF [05], etc. •จຒΊࠐΈઐ༻ͷϞσϧΛߏங • Skip-Thought [06], SCDV [07], InferSent [08], USE [09], etc. [01] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18 [02] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18 [03] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR ’19 [04] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17 [05] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP ’18 [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [07] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17 [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17 [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 9
  10. Sentence-BERT [10] •ࣗવݴޠਪ࿦ (Natural Language Inference; NLI)
 ͰBERTΛ fi ne-tuning

    • ࣄલֶशࡁΈݴޠϞσϧ (Pre-trained 
 Language Model; PLM) Λ༻͍ͨ
 จຒΊࠐΈϞσϧͷ૲෼͚తݚڀ • BERTͰInferSentΛ΍Δ •౰࣌େ෯ʹSOTA (state-of-the-art) Λߋ৽ • ޙड़͢ΔSimCSE͕΄΅্Ґޓ׵ʹͳͬͯ
 ͠·ͬͨͷͰɺࠓޙ͸͋·Γ࢖ΘΕͳͦ͞͏? [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP '19 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 10 ਤ͸౰֘࿦จΑΓҾ༻
  11. SimCSE [11] •ରরֶश(Contrastive Learning)Λ༻͍ͯBERTΛ fi ne-tuning • Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ •

    Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ •Ϳͬͪ͗ΓͷSOTAɺ೿ੜݚڀ΋ଓʑ [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 11 ਤ͸౰֘࿦จΑΓҾ༻ɻҎલ࣮ࢪͨ͠SimCSEͷྠߨࢿྉ͸ͪ͜Β
  12. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 12
  13. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 13
  14. •จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity

    (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 14 STSͱSentEval͕࠷΋Α͘࢖ΘΕΔ
  15. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ ಋೖ: Semantic Textual Similarity (STS) 15

  16. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ ಋೖ: Semantic Textual Similarity (STS) 16 ࣮ࡍͷSTSσʔληοτͷࣄྫ

  17. •จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ •ਓखධՁͱϞσϧ͕ܭࢉͨ͠ྨࣅ౓
 ͷ૬ؔ܎਺ͰධՁ • Pearsonͷ(ੵ཰)૬ؔ܎਺ • SpearmanͷॱҐ૬ؔ܎਺ •จຒΊࠐΈධՁͰ͸ڭࢣͳ͠ઃఆ

    • STSσʔλΛ༻ֶ͍ͨश͸͠ͳ͍ • ࣄલʹ܇࿅͞ΕͨϞσϧΛධՁ ಋೖ: Semantic Textual Similarity (STS) 17
  18. ڭࢣͳ͠STSͷධՁखॱ ᶃ จϖΞσʔληοτΛ༻ҙ ᶄ จຒΊࠐΈϞσϧΛ༻ҙ ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ ಋೖ: ڭࢣͳ͠ (Unsupervised)

    STS 18 จຒΊࠐΈϞσϧ จA จB ᶄ ᶅ ᶃ
  19. ڭࢣͳ͠STSͷධՁखॱ ᶃ จϖΞσʔληοτΛ༻ҙ ᶄ จຒΊࠐΈϞσϧΛ༻ҙ ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ ᶆ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ •

    ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ ᶇ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ ಋೖ: ڭࢣͳ͠ (Unsupervised) STS 19 จຒΊࠐΈϞσϧ จA จB ᶄ ᶅ ਓखධՁ
 ʹΑΔྨࣅ౓ ᶆ Ϟσϧ
 ʹΑΔྨࣅ౓ ᶇ ૬ؔ܎਺ ᶃ
  20. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 20 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
  21. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 21 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595
  22. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 22 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5
  23. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 23 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 จϖΞͷྨࣅ౓ॱҐ
  24. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 24 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0)
  25. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 25 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
  26. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 26 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)
  27. ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 27 จA จB ਓؒධՁ Model 1 Model 2

    A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 Model 2ͷ΄͏͕༏Ε͍ͯΔ
  28. ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19,

    20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 28
  29. ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19,

    20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 29 STS12-16͸ͦΕͧΕখ͍͞σʔληοτͷू߹ ௨ৗɺ“αϒ”σʔληοτΛࠞͥͯ૬ؔ܎਺Λܭࢉ STS12-16, STS Benchmark, SICK-RͷείΞͷ
 ฏۉͰ࠷ऴతͳධՁ͕͞ΕΔ͜ͱ͕ଟ͍
  30. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit

    for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 30 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ
  31. •ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit

    for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 31 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ
  32. SentEvalͷධՁखॱ ᶃ ύϥϝʔλΛݻఆͨ͠
 จຒΊࠐΈϞσϧΛ༻ҙ ᶄ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶅ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ
 ͷ඼࣭ΛධՁ •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”ͱ͍͏Ծఆ

    •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍ • i.e. จຒΊࠐΈͷ֤࣍ݩͷॏΈ෇͖࿨Ͱ෼ྨ •“ࣄલ܇࿅ࡁΈͷ”จຒΊࠐΈϞσϧͷੑೳΛධՁ ಋೖ: SentEval 32 จຒΊࠐΈϞσϧ จ ᶄ ᶃ ෼ྨث ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ ᶅ
  33. ఏҊख๏: LaBSE

  34. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 34 https://arxiv.org/abs/2007.01852
  35. •طଘݚڀͱͷൺֱ •LaBSEͷߏ੒ཁૉ • Dual-encoderΞʔΩςΫνϟ • Translation ranking task • MLM

    and TLM Pre-training •ଟݴޠจຒΊࠐΈͷؔ࿈ݚڀ •࣮ݧઃఆɾֶशͷ޻෉ɾධՁख๏ •࣮ݧ݁Ռ •෼ੳ •෇Ճతͳ࣮ݧ ໨࣍ 35
  36. •ଟݴޠจຒΊࠐΈϞσϧΛ࡞ΔͨΊʹࣄલֶशˠ fi ne-tuning •গࢿݯݴޠͰߴ͍ੑೳ • ֶशσʔλʹແ͍ݴޠϖΞͰͷੑೳ΋ߴ͍ •ݸʑͷख๏ɾςΫχοΫ͸΄ͱΜͲطଘݚڀͷ΋ͷ • ྑ͍ײ͡ͷ૊Έ߹Θͤ +

    େن໛σʔλ + େن໛ֶशͷͨΊͷ޻෉
 ͕͜ͷݚڀͷߩݙ LaBSE: طଘݚڀͱͷൺֱ 36
  37. Dual-encoderΞʔΩςΫνϟ •܇࿅ํ๏ (Training strategy)ͷҰͭɺจຒΊࠐΈख๏ͰҰൠత •Sentence-BERT, SimCSEͳͲ΋͜ͷํࣜ Translation ranking task •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश

    •Additive margin softmaxΛ༻ֶ͍ͯशΛ޻෉ MLM and TLM Pre-training •Masked Language Modeling (MLM) •Translation Language Modeling (TLM) LaBSEͷߏ੒ཁૉ 37
  38. •2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒ • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ) • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ

    38 Encoder Decoder Encoder-Decoder Encoder Encoder Dual-Encoder ଛࣦܭࢉ
  39. •2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒ • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ) • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ

    39 Encoder Decoder Encoder-Decoder Encoder Encoder Dual-Encoder ଛࣦܭࢉ EncDecͷλεΫ • લޙจੜ੒ • ຋༁จੜ੒ • Denoising AE ॏΈڞ༗ Dual-EncoderͷλεΫ • ؚҙؔ܎ೝࣝ • ରরֶश
  40. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 40 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ਖ਼ྫ ෛྫ
  41. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 41 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ
  42. •GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+:

    E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 42 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ ྨࣅ౓࠷େԽ ྨࣅ౓࠷খԽ
  43. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ LaBSEͷߏ੒ཁૉ: Translation ranking task 43

    ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  44. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ LaBSEͷߏ੒ཁૉ: Translation

    ranking task 44 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ ྨࣅ౓͸0.98… 0.24…
  45. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ LaBSEͷߏ੒ཁૉ: Translation ranking task 45 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…
  46. •·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ

    • 1ରNΛNճ܁Γฦ͢Πϝʔδ •ଛࣦؔ਺͸ˠ • ͸ຒΊࠐΈͷ಺ੵ ϕ LaBSEͷߏ੒ཁૉ: Translation ranking task 46 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…
  47. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 47 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  48. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 48 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  49. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 49 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  50. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 50 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  51. •ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ

    • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ • ղফ͢ΔͨΊɺ2ํ޲(→↓)ͷଛࣦΛ଍͠߹ΘͤΔ LaBSEͷߏ੒ཁૉ: Translation ranking task 51 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ
  52. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ ϕ ϕ′  [27]

    Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 52
  53. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′ 

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 53
  54. •ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′ 

    [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 54
  55. Masked Language Modeling (MLM) •ϚεΫ͞Εͨ෦෼ʹ౰ͯ͸·Δ୯ޠΛ༧ଌ͢Δ͜ͱͰϞσϧΛ܇࿅ • ࣗݾڭࢣ͋ΓֶशͰ൚༻తͳݴޠ஌ࣝΛ֫ಘ •BERT [00]Ͱ͓ೃછΈ LaBSEͷߏ੒ཁૉ:

    MLM and TLM Pre-training 55 զഐ ͸ [MASK] Ͱ͋Δ ɻ [CLS] [SEP] BERT ೣ
  56. Translation Language Modeling (TLM) [28] •຋༁จϖΞΛ࿈݁ͯ͠MLM • ೋݴޠؒͷରԠ (alignment) ͷֶशΛظ଴

    [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 56 զഐ ͸ [MASK] Ͱ͋Δ ɻ [/s] [/s] Transformer ೣ [MASK] am a [/s] I cat
  57. •TLM͸MLMͷ֦ு • ଟগͷมߋͰMLMͱಉ͡Α͏ʹֶश͕Ͱ͖Δ •LaBSEͰ͸MLMͱTLM
 Λ૊Έ߹Θֶͤͯश [28] Conneau+: Cross-lingual Language Model

    Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 57 ͜ͷݚڀͰ͸ଟݴޠೳྗ (multilinguality)Λ޲্ͤ͞ΔͨΊ
 Language embeddings
 Λ࢖͍ͬͯͳ͍
  58. •BiLSTMΛ༻͍ͨseq2seqͷ຋༁λεΫΛղ͖จຒΊࠐΈϞσϧΛ֫ಘ • Encoder͕ݴޠඇґଘͷจදݱΛநग़ • DecoderʹจຒΊࠐΈͱݴޠIDΛิॿతʹೖྗͯ͠จੜ੒ •ݴޠԣஅNLI (XNLI)΍ଟݴޠݕࡧλεΫͰߴ͍ੑೳɺ97ݴޠʹରԠ [29] Artetxe+: Massively

    Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, TACL ‘18 ؔ࿈ݚڀ: LASER [29] 58
  59. •ϚϧνλεΫֶशʹΑͬͯߴੑೳͳଟݴޠจຒΊࠐΈϞσϧΛ֫ಘ • Translation ranking taskֶ͕शλεΫʹؚ·Ε͍ͯΔ • ࣗݾڭࢣ͋ΓֶशͳͲʹΑΔࣄલֶश͸(ڪΒ͘)͍ͯ͠ͳ͍ [30] Yang+: Multilingual

    Universal Sentence Encoder for Semantic Retrieval, ACL: System Demonstrations ‘20 ؔ࿈ݚڀ: m-USE [30] 59
  60. Monolingual Data •CommonCrawl, Wikipedia͔Βऩूɺ17B (170ԯ) sentences •લॲཧࡁΈɺࣄલֶश(MLM)ʹͷΈར༻ Bilingual Translation Pairs

    •Webϖʔδͷ຋༁จϚΠχϯά(bitext mining)Ͱσʔλऩूɺ6B (60ԯ) pairs •σʔλෆۉߧରࡦͱͯ͠ɺ֤ݴޠͷจ਺͕100MҎԼʹͳΔΑ͏੍ݶ •αϒηοτ΁ͷਓखධՁΛ༻͍ͨ௿඼࣭σʔλͷϑΟϧλϦϯά •ࣄલֶश(MLM & TLM)ͱdual-encoderͷ܇࿅ʹར༻ ֶशσʔλ 60
  61. •ରরֶशϕʔεͷख๏͸Ұൠʹෛྫͷ਺͕ଟ͍΄Ͳੑೳ͕ߴ͘ͳΔ • ࠓճ͸in-batch negativesΛ༻͍ΔͷͰbatch_size - 1͕ෛྫͷ਺ • batch sizeΛେ͖͘͢Δͱͦͷ෼େྔͷϝϞϦΛ৯͏ •Accelerator͝ͱ(͜ͷݚڀͰ͸TPU)ʹ෼ׂͯ͠ྨࣅ౓ܭࢉ

    ֶशͷ޻෉ 61
  62. United Nations (UN) •ӳޠ͔ΒରԠ͢ΔผݴޠͷจॻΛݕࡧ (Precision@1ʹaccuracy) •en-fr, en-es, en-ru, en-ar, en-zhͷ5ݴޠରɺ86,000จ

    Tatoeba •ӳޠҎ֎ͷݴޠ͔ΒରԠ͢Δӳ༁Λݕࡧ (Average accuracy) •https://tatoeba.org ͔ΒྫจͱͦΕʹඥͮ͘ର༁Λऩूͨ͠ίʔύε •112ݴޠɺ֤ݴޠʹ͖ͭ1000จͱରԠ͢Δӳ༁͕ଘࡏ •طଘݚڀʹ฿͍ɺ36ݴޠͷΈʹߜͬͨαϒηοτͰͷධՁ΋࣮ࢪ BUCC •୯ݴޠίʔύε͔Β຋༁จϖΞΛݟ͚ͭΔ (Precision, Recall, F1) •fr-en, de-en, ru-en, zh-enͷ4ݴޠର ධՁλεΫ: bitext retrieval 62
  63. SentEval •ԼྲྀλεΫʹ͓͚Δ෼ྨੑೳΛධՁ •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏ Semantic Textual Similarity (STS) •ϞσϧʹΑΔྨࣅ౓ͱਓखධՁʹΑΔྨࣅ౓ͱͷ૬ؔΛධՁ •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏ •ࣄޙ෼ੳͷઅͰ঺հ

    ධՁλεΫ: จຒΊࠐΈͷ඼࣭ධՁ 63
  64. •ޠኮαΠζ • mBERT Vocab: multilingual BERT (mBERT)ͱಉ͡(119,547) • Customized Vocab:

    ݴޠ͝ͱͷσʔλෆۉߧରࡦΛͯ͠ॳΊ͔Β࡞੒ (501,153) •ࣄલֶश (PT) • MLM+TLMʹΑΔࣄલֶशΛ΍Δ͔Ͳ͏͔ • ΍Βͳ͍৔߹͸Translation ranking taskͷΈΛߦ͏ •Additive Margin Softmax (AMS) • Translation ranking taskʹmarginΛ࢖͏͔Ͳ͏͔ ࣮ݧ৚݅ 64
  65. •จຒΊࠐΈʹ͸ [CLS] ΛL2ਖ਼نԽͯ͠ར༻ •optimizer = AdamW, learning rate = 1e-3,

    seq length = 128 Pre-training •batch size: 8192 Translation ranking task •batch size: 4096 •w/ Pre-training: 50k steps, w/o Pre-training: 500k steps •margin value: 0.3 ࣮ݧઃఆ 65
  66. •LaBSE (Customized Vocab + AMS + PT) ʹSOTAߋ৽ • Yang

    et al. ͸bilingual modelͰ͋Γɺ֤ݴޠ͝ͱʹϞσϧ͕ඞཁ • LaBSE͸ҰͭͷϞσϧͰ109Ҏ্ͷݴޠʹରԠՄೳ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 66 ͳ͔ͥBase w/ Customized Vocabͷ݁Ռ͕ଘࡏ͠ͳ͍
  67. •UNʹ͓͍ͯPre-training (PT)͸྆ํͷޠኮͰ༗ޮ • Translation ranking taskʹΑΔֶश΋଎͘ͳΔ ࣮ݧ݁Ռ: United Nations (UN)

    & Tatoeba 67
  68. •Tatoebaʹ͓͍ͯPre-training (PT)͸mBERT VocabͰ͸༗ޮͰͳ͍ • mBERT VocabͰ͸[UNK]ʹஔ͖׵ΘΔtoken͕ଟ͘ͳΔͷ͕ݪҼ • ྫ͑͹͋ΔݴޠͰ͸71%͕[UNK]ʹͳΔ ࣮ݧ݁Ռ: United

    Nations (UN) & Tatoeba 68
  69. •UNͱTatoebaͰੑೳͷ܏޲͕ҟͳΔ • UNͷํ͕େن໛ͳbitext retrieval • Ϟσϧؒͷҧ͍Λৄࡉʹݕग़͢Δʹ͸େن໛ͳϕϯνϚʔΫ͕ඞཁ ࣮ݧ݁Ռ: United Nations (UN)

    & Tatoeba 69
  70. •طଘݚڀͱൺֱͯ͠LaBSE͕΄ͱΜͲͰ࠷ߴੑೳ •ҰͭͷϞσϧͰશͯͷݴޠରʹରԠՄೳ • Yang et al.͸4ͭͷϞσϧ͕ඞཁ ࣮ݧ݁Ռ: BUCC 70

  71. •ӳޠ୯ݴޠʹ͓͚ΔԼྲྀλεΫ΁ͷసҠੑೳ͸ͦ͜·Ͱߴ͘ͳ͍ • ӳޠจຒΊࠐΈϞσϧͱಉ͡ఔ౓ʹ͸ߴ͍ • ଟݴޠϞσϧͷੑೳͱͯ͠͸े෼͔ ࣮ݧ݁Ռ: SentEval 71

  72. •maringͷ஋ʹΑΔUNͰͷ
 ੑೳมԽΛ؍࡯ • margin͕ੑೳ޲্ʹد༩ •0.3ఔ౓ͰੑೳมԽ͕ऩଋ • ͦ͜·Ͱ͸consistentʹ޲্ • શͯͷϞσϧͰੑೳ͕޲্ ෼ੳ:

    Additive Margin Softmax 72 marginΛ0.4ΑΓେ͖͍ͯͬͨ࣌͘͠ͷมԽ͕ऩଋͨ͠··ͳͷ͔ؾʹͳΔ͕…
  73. •PTʹΑͬͯશମతͳੑೳ޲্ •PT͋ΓͷϞσϧ͸50K stepsͷ
 ܇࿅Ͱطʹੑೳ͕ऩଋ • 50K steps = 200M examples

    • ର༁σʔλ͕গͳ͍͍ͯ͘ •PT͸ੑೳ޲্ͱऩଋ଎౓޲্
 (=܇࿅ࣄྫ਺࡟ݮ)ʹ໾ཱͭ ෼ੳ: ࣄલֶशͷ༗༻ੑ 73
  74. •Tatoebaͷগࢿݯݴޠʹ
 ର͢ΔੑೳΛ෼ੳ • LaBSE͸গσʔλͳݴޠ
 ͕ࠞͬͯ͟΋ੑೳ͕ߴ͍ •en-xxͳݴޠରͷσʔλ͕
 ଘࡏ͠ͳ͍ݴޠʹ͍ͭͯͷ
 TatoebaͰͷੑೳΛ෼ੳ • 3ׂ͕75%௒͑ɺ݁ߏߴ͍

    • ݴޠಉ࢜ͷྨࣅ౓ͷଞʹɺ
 େن໛σʔλͰͷ܇࿅ͷ
 ޮՌͩͱߟ͑ΒΕΔ ෼ੳ: গࢿݯݴޠʹର͢Δੑೳ 74
  75. •LaBSEͷӳޠSTSʹ͓͚Δ
 ੑೳΛ෼ੳ •NLIͰֶश͍ͯ͠Δm-USE
 ͕ඇৗʹߴ͍ੑೳ • SBERTΑΓ΋ߴ͍ʁ •LaBSE͸௿Ίͷ਺஋ •຋༁จϖΞʹΑΔֶश͸ • ҙຯͷ౳Ձੑͷݕग़ʹ͸༏ΕΔ͕

    • ҙຯ͕ͲΕ͘Β͍ҟͳ͍ͬͯΔ͔Λࡉ͔۠͘ผ͢Δ͜ͱ͸Ͱ͖ͳ͍ ෼ੳ: STS 75
  76. •CommonCrawl͔Βର༁σʔλΛऩू͢Δ࣮ݧΛ࣮ࢪ •ऩूͨ͠ର༁ίʔύεͰػց຋༁ϞσϧΛ܇࿅ɺੑೳධՁ • ଟݴޠจຒΊࠐΈϞσϧͷԠ༻ɾ༗༻ੑධՁ •ӳޠɺதࠃޠɺυΠπޠͷCommonCrawlΛલॲཧ • ·ͣશͯͷจΛຒΊࠐΈදݱʹ • ͢΂ͯͷඇӳޠͷจʹରͯ͠ɺຒΊࠐΈʹ͓͚Δ࠷ۙ๣ΛରԠ͚ͮ •

    ྨࣅ౓͕0.6ະຬͷ΋ͷ͸আڈ •WMTͷϕϯνϚʔΫσʔληοτΛ༻͍ͯɺBLEUͰ຋༁ੑೳධՁ ෇Ճతͳ࣮ݧ: Mining Parallel Text from CommonCrawl 76
  77. •ਓख࡞੒͞Εͨର༁σʔλΛ༻͍ͨγεςϜͱൺֱ • en-deͷNewsʹ͓͍ͯɺطଘݚڀ͔Β2.8ϙΠϯτͷԼམͷΈ • en-zhͷNewsʹ͓͍ͯ͸΄΅ಉ౳ͷੑೳ •TEDʹ͓͍ͯ΋طଘγεςϜͱಉ౳ఔ౓ͷੑೳ • ػցతʹ࡞੒͞Εͨσʔλͷ܇࿅Ͱಉ౳ੑೳͰ͍͢͝ ෇Ճతͳ࣮ݧ: Mining

    Parallel Text from CommonCrawl 77
  78. •109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive

    Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ • ࣄલֶशʹΑͬͯର༁σʔλྔΛ࡟ݮ • AMS͕ੑೳʹେ͖ͳӨڹ •ࣄલֶशࡁΈϞσϧ͕ެ։͞Ε͍ͯΔ ·ͱΊ: Language-agnostic BERT Sentence Embedding 78 https://arxiv.org/abs/2007.01852
  79. ෇࿥

  80. •(ॳग़2020೥ͷݚڀ͚ͩͲ)ͪΐͬͱݹ͍ • ൺֱର৅ͱͯ͠SimCSE΍DistilCSE͕࢖ΘΕ͍ͯͳ͍ •Typo͕ͪΐͬͱଟ͍ʢ ; ; ʣ •ରরֶशΛ༻͍ͨจຒΊࠐΈख๏ͷఆੴ͕࢖ΘΕ͍ͯͳ͍ • จྨࣅ౓ͱͯ͠಺ੵΛ࢖͍ͬͯΔ

    • Թ౓ύϥϝʔλΛ࢖͍ͬͯͳ͍ • ͱࢥͬͨΒL2 normalization→scaling͍ͯͨ͠ͷͰຊ࣭తʹಉͩͬͨ͡ ͪΐͬͱؾʹͳΔ఺ 80
  81. •࠷ۙ͸PearsonͰ͸ͳ͘SpearmanͰධՁ͢Δ͜ͱ͕ଟ͍ • Pearson͸͋Μ·ΓධՁࢦඪͱͯ͠ྑ͘ͳ͍ΑͶͱ͍͏࿩͕͋Δ [31] •SpearmanͰධՁ͢ΔݶΓSTS͸ʮॱҐ෇͚λεΫʯͳ͜ͱʹ஫ҙ •࠷ۙSTS Benchmarkͷdev setͰϋΠύϥௐ੔͢Δͷ͕ྲྀߦ͍ͬͯΔ • 250

    step͝ͱʹdevͰධՁͯ͠࠷ߴͷcheckpointͰtestͰධՁ (SimCSE) • STSʹաద߹ͦ͠͏ͳͷͰ͋·Γྑ͍ํ਑ʹ͸ࢥ͑ͳ͍͕… • ʮֶशʹ͸࢖͑ͳ͍͚Ͳdevͱͯ͠͸࢖͑·͢ʂʯ͸ྑ͍ઃఆʁ •STSλεΫ͸ධՁख๏͕࿦จ͝ͱʹҟͳΔ͕࣌͋Γɺ஫ҙ͕ඞཁ • ධՁࢦඪ΍ධՁखॱ͕όϥ͍͍ͭͯΔ͜ͱ͕͋ͬͨ (࠷ۙ͸౷Ұ͞ΕͯΔ) • SimCSE࿦จ [11] ͷAppendix.Bʹهड़͕͋ΔͷͰҰಡΛਪ঑ [31] Reimers+: Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity, COLING ‘16 ಋೖ: STSʹ·ͭΘΔখൌ 81
  82. •࣮͸จͷҙຯҎ֎ΛຒΊࠐΜͰ΋͍͍ • ࢖༻୯ޠɺελΠϧ(ϑΥʔϚϧ͞ɺܟޠ)ɺ࣭໰ͱ౴͑ͷۙ͞ɺetc. •จຒΊࠐΈۭؒ͸จͷԿΛ͚ۙͮΔ͔Ͱಛ௃͚ͮΒΕΔ • ͲͷΑ͏ʹڑ཭Λఆٛ͢Δ͔͕จຒΊࠐΈͷੑ࣭ΛܾΊΔ •܇࿅Ͱ͚ۙͮΔจͱͦΕʹΑͬͯදݱ͞ΕΔ“ڑ཭”ͷରԠ (චऀͷ༧ଌ) • ؚҙؔ܎ʹ͋Δจ:

    จͷද૚తྨࣅ౓ΑΓҙຯʹ஫໨ • ࣭໰ͱճ౴: จࣗମͷҙຯΑΓ࣭໰ͱճ౴͕ද͢಺༰ʹ஫໨ • ຋༁จϖΞ: Ͳͷݴޠ͔Λແࢹͯ͠จͷҙຯʹ஫໨ ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 82
  83. [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural

    Language Inference Data, EMNLP ’17 [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP ’19 [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 [32] Hill+: Learning Distributed Representations of Sentences from Unlabelled Data, NAACL ’16 [33] Wang+:, TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, EMNLP fi ndings ’21 [34] Li+: OPTIMUS: Organizing Sentences via Pre-trained Modeling of a Latent Space, EMNLP ’20 ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 83 จͷҙຯΛຒΊࠐΉख๏ •InferSent: dual-encoder (Siamese) ߏ଄ͰLSTMΛNLI෼ྨͰֶश [08] •Sentence-BERT: dual-encoderߏ଄ͰBERTΛNLI෼ྨͰ fi ne-tuning [10] •Supervised SimCSE: NLIͷؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश[11] จΛ࠶ߏஙͰ͖ΔΑ͏ʹจͷ৘ใΛຒΊࠐΉख๏ •SDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠LSTMΛֶश [32] •TSDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠TransformerΛֶश [33] •Optimus: Ͱ͔͍VAE [34]
  84. [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [09] Cer+: Universal Sentence

    Encoder, arXiv, Mar 2018 [35] Tsukagoshi+: DefSent: Sentence Embeddings using De fi nition Sentences, ACL ’21 [36] Wu+: DistilCSE: E ff ective Knowledge Distillation For Contrastive Sentence Embeddings, ARR ’22 [37] Wu+: DisCo: E ff ective Knowledge Distillation For Contrastive Learning of Sentence Embeddings, arXiv ’21 ([35]ͱಉ಺༰) ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 84 લޙͷจͷ৘ใΛຒΊࠐΉख๏ •Skip-Thought: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [06] •USE: Skip-Thoughtͷڭࢣͳֶ͠श + ෼ྨ໰୊ʹΑΔڭࢣ͋Γֶश [09] ఆٛจ͔Β୯ޠͷҙຯΛߏ੒͢ΔΑ͏ʹจͷҙຯΛຒΊࠐΉख๏ •DefSent: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [35] Α͘Θ͔Βͳ͍ख๏ •Unsupervised SimCSE: ҟͳΔdropoutΛద༻ͨ͠จΛਖ਼ྫʹରরֶश [11] •DistilCSE: ڭࢣͱੜెͷจຒΊࠐΈΛਖ਼ྫͱͨ͠ରরֶशʹΑΔৠཹ [36, 37]
  85. •୯ݴޠจຒΊࠐΈϞσϧ͔Βͷ஌ࣝৠཹͰଟݴޠจຒΊࠐΈϞσϧΛ܇࿅ • ຋༁ίʔύεΛ࢖ͬͯҟͳΔݴޠͷจຒΊࠐΈΛ௚઀͚ۙͮΔ •ҟݴޠؒSTSͰΑ͍ੑೳ •୯ݴޠSTSͰ΋ੑೳ͕ߴ͍ • NLIΛ࢖͍ͬͯΔͷͰ౰ͨΓલʁ •ݴޠόΠΞε͕LaBSEΑΓখ͍͞ •Teacher modelͷจຒΊࠐΈۭؒͱࣅͨߏ଄Λ࣋ͭଟݴޠจຒΊࠐΈϞσ

    ϧΛ࡞ΕΔͷ͕ྑ͍ͱ͜Ζ • ҰͭͷϞσϧͰsequentialʹ΍Δͱۭ͕ؒյΕΔ [38] Reimers+: Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, EMNLP ‘20 ؔ܎͕͋Δ࿦จ:
 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation [38] 85