Slide 1

Slide 1 text

Language-agnostic BERT
 Sentence Embedding M2, Graduate School of Informatics, Nagoya University, Japan ൃදऀ: ௩ӽॣ / Hayato TSUKAGOSHI Fangxiaoyu Feng, Yinfei Yang, Daniel Cer, Naveen Arivazhagan, Wei Wang ACL 2022 URL: https://arxiv.org/abs/2007.01852

Slide 2

Slide 2 text

•109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 2 https://arxiv.org/abs/2007.01852

Slide 3

Slide 3 text

ಋೖ

Slide 4

Slide 4 text

•ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 4 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]

Slide 5

Slide 5 text

•ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 5 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...]

Slide 6

Slide 6 text

•ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ ಋೖ: จຒΊࠐΈ / Sentence embedding 6 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...] ҙຯతʹྨࣅ ͍ۙҙຯΛ࣋ͭจ͸ ۙ͘ʹ෼෍ ϕΫτϧؒͷڑ཭͕
 ҙຯతͳؔ܎Λදݱ

Slide 7

Slide 7 text

•ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢ • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠ ಋೖ: จຒΊࠐΈ / Sentence embedding 7

Slide 8

Slide 8 text

•ຒΊࠐΈ (embedding) ͱ͍͏໊લͷ༝དྷ • ୯ޠྻ͸ޠኮ਺ͷ௒ߴ࣍ݩϕΫτϧྻ • ΑΓ௿࣍ݩͷϕΫτϧͰจΛද͢ • ଟ༷ମؔ࿈ͷ༻ޠΒ͍͠ ༗༻ੑɾԠ༻ઌ •ྨࣅจ(ॻ)ݕࡧɾΫϥελϦϯά •ܰྔͳจ(ॻ)෼ྨɾಛ௃நग़ (ଞλεΫͰԉ༻) •ີϕΫτϧݕࡧ (Dense Passage Retrieval)ʹΑΔ࣭໰Ԡ౴ •຋༁ϝϞϦΛ༻͍ͨࣄྫϕʔε຋༁ɺࣄྫϕʔεػցֶश •Ԡ༻͸ඇৗʹ޿ൣ ಋೖ: จຒΊࠐΈ / Sentence embedding 8

Slide 9

Slide 9 text

BERTҎલ •୯ޠຒΊࠐΈͷ(ॏΈ෇͚)ฏۉͳͲͰจຒΊࠐΈΛ֫ಘ • p-mean [01], SWEM [02], DynaMax [03], SIF [04], uSIF [05], etc. •จຒΊࠐΈઐ༻ͷϞσϧΛߏங • Skip-Thought [06], SCDV [07], InferSent [08], USE [09], etc. [01] Ru ̈ckle ́+: Concatenated Power Mean Word Embeddings as Universal Cross-Lingual Sentence Representations, arXiv ’18 [02] Shen+: Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, ACL ’18 [03] Zhelezniak+: Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors, ICLR ’19 [04] Arora+: A Simple but Tough-to-Beat Baseline for Sentence Embeddings, ICLR '17 [05] Ethayarajh: Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline, Rep4NLP ’18 [06] Kiros+: Skip-Thought Vectors, NIPS ’15 [07] Mekala+: SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations, ACL ’17 [08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP '17 [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 9

Slide 10

Slide 10 text

Sentence-BERT [10] •ࣗવݴޠਪ࿦ (Natural Language Inference; NLI)
 ͰBERTΛ fi ne-tuning • ࣄલֶशࡁΈݴޠϞσϧ (Pre-trained 
 Language Model; PLM) Λ༻͍ͨ
 จຒΊࠐΈϞσϧͷ૲෼͚తݚڀ • BERTͰInferSentΛ΍Δ •౰࣌େ෯ʹSOTA (state-of-the-art) Λߋ৽ • ޙड़͢ΔSimCSE͕΄΅্Ґޓ׵ʹͳͬͯ
 ͠·ͬͨͷͰɺࠓޙ͸͋·Γ࢖ΘΕͳͦ͞͏? [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP '19 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 10 ਤ͸౰֘࿦จΑΓҾ༻

Slide 11

Slide 11 text

SimCSE [11] •ରরֶश(Contrastive Learning)Λ༻͍ͯBERTΛ fi ne-tuning • Unsupervised SimCSE:ʮಉ͡จΛ2ճຒΊࠐΜͰରরֶशʯ • Supervised SimCSE: ʮؚҙؔ܎ʹ͋ΔจΛਖ਼ྫͱͯ͠ରরֶशʯ •Ϳͬͪ͗ΓͷSOTAɺ೿ੜݚڀ΋ଓʑ [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 ಋೖ: จຒΊࠐΈͷ୅දతͳख๏ 11 ਤ͸౰֘࿦จΑΓҾ༻ɻҎલ࣮ࢪͨ͠SimCSEͷྠߨࢿྉ͸ͪ͜Β

Slide 12

Slide 12 text

•จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 12

Slide 13

Slide 13 text

•จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 13

Slide 14

Slide 14 text

•จຒΊࠐΈͷ“ྑ͞”͸ͲͷΑ͏ʹධՁ͢Ε͹Α͍͔ʁ • ࣮͸ٞ࿦͕ਚ͘͞Ε͍ͯͳ͍(ͱࢥ͏) • ͲͷΑ͏ͳจຒΊࠐΈ͕࡞ΒΕΔ΂͖ͩΖ͏͔ʁͱ͍͏໰͍͸ະղܾ •ͱ͸͍͑ԿΒ͔ͷج४ͰධՁ͸͠ͳ͍ͱ͍͚ͳ͍ ධՁࢦඪ •Semantic Textual Similarity (STS): ਓؒɾϞσϧ͕ଌͬͨจྨࣅ౓ͷ૬ؔ •SentEval: ςΩετ෼ྨͳͲԼྲྀλεΫͰͷੑೳ [12, 13] •SentGLUE: GLUE [14]ΛจຒΊࠐΈΛ༻͍ͨղ͖ํʹ੍ݶ [15] •ΫϥελϦϯάɺςΩετݕࡧ [12] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ’18 [13] Conneau+: What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties, ACL ’18 [14] Wang+: GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding, EMNLP Workshop BlackboxNLP ’18 [15] Ni+: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models, CoRR ’21 ಋೖ: จຒΊࠐΈͷධՁ 14 STSͱSentEval͕࠷΋Α͘࢖ΘΕΔ

Slide 15

Slide 15 text

•จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ ಋೖ: Semantic Textual Similarity (STS) 15

Slide 16

Slide 16 text

•จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ ಋೖ: Semantic Textual Similarity (STS) 16 ࣮ࡍͷSTSσʔληοτͷࣄྫ

Slide 17

Slide 17 text

•จຒΊࠐΈϞσϧͷҙຯΛଊ͑Δ
 ೳྗΛਓؒධՁͱͷ૬ؔͰධՁ •จϖΞʹҙຯతྨࣅ౓͕ਓखͰ෇༩ •ਓखධՁͱϞσϧ͕ܭࢉͨ͠ྨࣅ౓
 ͷ૬ؔ܎਺ͰධՁ • Pearsonͷ(ੵ཰)૬ؔ܎਺ • SpearmanͷॱҐ૬ؔ܎਺ •จຒΊࠐΈධՁͰ͸ڭࢣͳ͠ઃఆ • STSσʔλΛ༻ֶ͍ͨश͸͠ͳ͍ • ࣄલʹ܇࿅͞ΕͨϞσϧΛධՁ ಋೖ: Semantic Textual Similarity (STS) 17

Slide 18

Slide 18 text

ڭࢣͳ͠STSͷධՁखॱ ᶃ จϖΞσʔληοτΛ༻ҙ ᶄ จຒΊࠐΈϞσϧΛ༻ҙ ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ ಋೖ: ڭࢣͳ͠ (Unsupervised) STS 18 จຒΊࠐΈϞσϧ จA จB ᶄ ᶅ ᶃ

Slide 19

Slide 19 text

ڭࢣͳ͠STSͷධՁखॱ ᶃ จϖΞσʔληοτΛ༻ҙ ᶄ จຒΊࠐΈϞσϧΛ༻ҙ ᶅ จϖΞͦΕͧΕΛจϕΫτϧʹ ᶆ จ“ϕΫτϧ”ϖΞͷྨࣅ౓Λܭࢉ • ίαΠϯྨࣅ౓͕Α͘༻͍ΒΕΔ ᶇ ਓؒධՁͱͷ(ॱҐ)૬ؔ܎਺Λܭࢉ ಋೖ: ڭࢣͳ͠ (Unsupervised) STS 19 จຒΊࠐΈϞσϧ จA จB ᶄ ᶅ ਓखධՁ
 ʹΑΔྨࣅ౓ ᶆ Ϟσϧ
 ʹΑΔྨࣅ౓ ᶇ ૬ؔ܎਺ ᶃ

Slide 20

Slide 20 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 20 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595

Slide 21

Slide 21 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 21 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595

Slide 22

Slide 22 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 22 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5

Slide 23

Slide 23 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 23 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 จϖΞͷྨࣅ౓ॱҐ

Slide 24

Slide 24 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 24 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0)

Slide 25

Slide 25 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 25 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)

Slide 26

Slide 26 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 26 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 ਖ਼ղͷॱҐͱ༧ଌʹΑΔ
 ॱҐͷ૬ؔΛܭࢉ(ެࣜʹಥͬࠐΉ)

Slide 27

Slide 27 text

ಋೖ: STSʹ͓͚ΔSpearmanͷॱҐ૬ؔ܎਺ͷܭࢉ 27 จA จB ਓؒධՁ Model 1 Model 2 A man is playing a guitar. The man is playing the guitar. 4.909 0.985 0.978 A man is playing a guitar. A guy is playing an instrument. 3.800 0.646 0.895 A man is playing a guitar. A man is playing a guitar and singing. 3.200 0.874 0.977 A man is playing a guitar. The girl is playing the guitar. 2.250 0.747 0.831 A man is playing a guitar. A woman is cutting vegetable. 0.000 0.290 0.595 1 1 4 3 2 2 3 4 5 5 1 2 3 4 5 r1 = 1 − 6 5(52 − 1) {(1−1)2 + (2−4)2 + (3−2)2 + (4−5)2 + (5−5)2} = 1 − 6 120 (0 + 4 + 1 + 1 + 0) r1 = 0.7 r2 = 0.9 Model 2ͷ΄͏͕༏Ε͍ͯΔ

Slide 28

Slide 28 text

ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19, 20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 28

Slide 29

Slide 29 text

ӳޠσʔληοτ •STS12, 13, 14, 15, 16 [16, 17, 18, 19, 20] •STS Benchmark (test set) [21] •SICK-R [22] ೔ຊޠσʔληοτ •JSICK [23] •JSTS [24] [16] Agirre+: SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity, *SEM ’12 [17] Agirre+: *SEM 2013 shared task: Semantic Textual Similarity, *SEM ‘13 [18] Agirre+: SemEval-2014 Task 10: Multilingual Semantic Textual Similarity, SemEval ‘14 [19] Agirre+: SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability, SemEval ’15 [20] Agirre+: SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation, SemEval ’16 [21] Cer+: SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation, SemEval ’17 [22] Marelli+: A SICK cure for the evaluation of compositional distributional semantic models, LREC ’14 [23] ୩த+: JSICK: ೔ຊޠߏ੒తਪ࿦ɾྨࣅ౓σʔληοτͷߏங, ਓ޻஌ೳֶձ ୈ35ճશࠃେձ (2021) [24] ܀ݪ+: JGLUE: ೔ຊޠݴޠཧղϕϯνϚʔΫ, ݴޠॲཧֶձ ୈ28ճ೥࣍େձ (2022) ಋೖ: STSͷධՁ༻σʔληοτ 29 STS12-16͸ͦΕͧΕখ͍͞σʔληοτͷू߹ ௨ৗɺ“αϒ”σʔληοτΛࠞͥͯ૬ؔ܎਺Λܭࢉ STS12-16, STS Benchmark, SICK-RͷείΞͷ
 ฏۉͰ࠷ऴతͳධՁ͕͞ΕΔ͜ͱ͕ଟ͍

Slide 30

Slide 30 text

•ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 30 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ

Slide 31

Slide 31 text

•ςΩετ෼ྨͳͲͷԼྲྀ (Downstream) λεΫ͕ू·ͬͨtoolkit •จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ɺ෼ྨੑೳ͔ΒจຒΊࠐΈͷ࣭ΛධՁ [25] Conneau+: SentEval: An Evaluation Toolkit for Universal Sentence Representations, LREC ‘18 ಋೖ: SentEval [25] 31 Task Type #train #test #class MR movie review 11,000 11,000 2 CR product review 4,000 4,000 2 SUBJ subjectivity status 10,000 10,000 2 MPQA opinion-polarity 11,000 11,000 2 SST-2 binary sentiment analysis 67,000 1,800 2 TREC question-type classi fi cation 6,000 500 6 MRPC paraphrase detection 4,100 1,700 2 λεΫҰཡ

Slide 32

Slide 32 text

SentEvalͷධՁखॱ ᶃ ύϥϝʔλΛݻఆͨ͠
 จຒΊࠐΈϞσϧΛ༻ҙ ᶄ จຒΊࠐΈΛೖྗͱ͢Δ෼ྨثΛ܇࿅ ᶅ ෼ྨثͷੑೳ͔ΒจຒΊࠐΈ
 ͷ඼࣭ΛධՁ •෼ྨੑೳ͕ߴ͍ํ͕“ྑ͍จຒΊࠐΈ”ͱ͍͏Ծఆ •෼ྨث͸ϩδεςΟοΫճؼ෼ྨث͕ଟ͍ • i.e. จຒΊࠐΈͷ֤࣍ݩͷॏΈ෇͖࿨Ͱ෼ྨ •“ࣄલ܇࿅ࡁΈͷ”จຒΊࠐΈϞσϧͷੑೳΛධՁ ಋೖ: SentEval 32 จຒΊࠐΈϞσϧ จ ᶄ ᶃ ෼ྨث ෼ྨੑೳ͔Β
 จຒΊࠐΈͷ඼࣭ΛධՁ ᶅ

Slide 33

Slide 33 text

ఏҊख๏: LaBSE

Slide 34

Slide 34 text

•109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ ࿦จબఆཧ༝ •จຒΊࠐΈؔ࿈ͷ࿩୊/ධՁ͕๛෋ • ڭҭతͳ(ଟݴޠ)จຒΊࠐΈͷ࿦จ LaBSE: Language-agnostic BERT Sentence Embedding 34 https://arxiv.org/abs/2007.01852

Slide 35

Slide 35 text

•طଘݚڀͱͷൺֱ •LaBSEͷߏ੒ཁૉ • Dual-encoderΞʔΩςΫνϟ • Translation ranking task • MLM and TLM Pre-training •ଟݴޠจຒΊࠐΈͷؔ࿈ݚڀ •࣮ݧઃఆɾֶशͷ޻෉ɾධՁख๏ •࣮ݧ݁Ռ •෼ੳ •෇Ճతͳ࣮ݧ ໨࣍ 35

Slide 36

Slide 36 text

•ଟݴޠจຒΊࠐΈϞσϧΛ࡞ΔͨΊʹࣄલֶशˠ fi ne-tuning •গࢿݯݴޠͰߴ͍ੑೳ • ֶशσʔλʹແ͍ݴޠϖΞͰͷੑೳ΋ߴ͍ •ݸʑͷख๏ɾςΫχοΫ͸΄ͱΜͲطଘݚڀͷ΋ͷ • ྑ͍ײ͡ͷ૊Έ߹Θͤ + େن໛σʔλ + େن໛ֶशͷͨΊͷ޻෉
 ͕͜ͷݚڀͷߩݙ LaBSE: طଘݚڀͱͷൺֱ 36

Slide 37

Slide 37 text

Dual-encoderΞʔΩςΫνϟ •܇࿅ํ๏ (Training strategy)ͷҰͭɺจຒΊࠐΈख๏ͰҰൠత •Sentence-BERT, SimCSEͳͲ΋͜ͷํࣜ Translation ranking task •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश •Additive margin softmaxΛ༻ֶ͍ͯशΛ޻෉ MLM and TLM Pre-training •Masked Language Modeling (MLM) •Translation Language Modeling (TLM) LaBSEͷߏ੒ཁૉ 37

Slide 38

Slide 38 text

•2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒ • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ) • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ 38 Encoder Decoder Encoder-Decoder Encoder Encoder Dual-Encoder ଛࣦܭࢉ

Slide 39

Slide 39 text

•2ͭͷEncoderͰจຒΊࠐΈදݱΛߏ੒ • ଟ͘ͷ৔߹Encoder͸ॏΈΛڞ༗ (ʹಉ͡Ϟσϧ) • Siamese network (γϟϜωοτϫʔΫ)ͱ΋ݺ͹ΕΔ LaBSEͷߏ੒ཁૉ: Dual-encoderΞʔΩςΫνϟ 39 Encoder Decoder Encoder-Decoder Encoder Encoder Dual-Encoder ଛࣦܭࢉ EncDecͷλεΫ • લޙจੜ੒ • ຋༁จੜ੒ • Denoising AE ॏΈڞ༗ Dual-EncoderͷλεΫ • ؚҙؔ܎ೝࣝ • ରরֶश

Slide 40

Slide 40 text

•GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+: E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 40 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ਖ਼ྫ ෛྫ

Slide 41

Slide 41 text

•GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+: E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 41 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ

Slide 42

Slide 42 text

•GuoΒ[26] ͕ఏҊ •຋༁จϖΞͷྨࣅ౓ΛɺͦͷଞͷจϖΞͷྨࣅ౓ΑΓߴ͘͢ΔΑ͏ʹֶश • ຋༁จϖΞΛਖ਼ྫͱͨ͠ରরֶश (Contrastive Learning) •“ranking”ͱݴ͍ͭͭ΍ͬͯΔͷ͸ਖ਼ྫ(ਖ਼͍͠຋༁จϖΞ)ͷྨࣅ౓࠷େԽ [26] Guo+: E ff ective Parallel Corpus Mining using Bilingual Sentence Embeddings, WMT ‘18 LaBSEͷߏ੒ཁૉ: Translation ranking task 42 զഐ͸ೣͰ͋Δɻ Ja I am a cat. En Nice to meet you. En ͚ۙͮΔ ԕ͚͟Δ ਖ਼ྫ ෛྫ ྨࣅ౓࠷େԽ ྨࣅ౓࠷খԽ

Slide 43

Slide 43 text

•·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ LaBSEͷߏ੒ཁૉ: Translation ranking task 43 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 44

Slide 44 text

•·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ LaBSEͷߏ੒ཁૉ: Translation ranking task 44 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ ྨࣅ౓͸0.98… 0.24…

Slide 45

Slide 45 text

•·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ • 1ରNΛNճ܁Γฦ͢Πϝʔδ LaBSEͷߏ੒ཁૉ: Translation ranking task 45 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…

Slide 46

Slide 46 text

•·ͣ຋༁จϖΞΛจຒΊࠐΈʹ •ਖ਼ྫɾෛྫ͢΂ͯͷ૊Έ߹Θͤʹ
 ͍ͭͯྨࣅ౓Λܭࢉ • ྨࣅ౓ͷߦྻ͕Ͱ͖Δ •ਖ਼ྫͷྨࣅ౓Λ࠷େԽ͢Δ • ʹྨࣅ౓ߦྻͷର֯ઢ͕ਖ਼ղ • ߦํ޲(→)ʹSoftmaxͯ͠ਖ਼نԽ • 1ରNΛNճ܁Γฦ͢Πϝʔδ •ଛࣦؔ਺͸ˠ • ͸ຒΊࠐΈͷ಺ੵ ϕ LaBSEͷߏ੒ཁૉ: Translation ranking task 46 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ 0.24… ྨࣅ౓͸0.98…

Slide 47

Slide 47 text

•ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 47 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. ਖ਼ྫ զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 48

Slide 48 text

•ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) LaBSEͷߏ੒ཁૉ: Translation ranking task 48 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 49

Slide 49 text

•ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 49 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 50

Slide 50 text

•ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ LaBSEͷߏ੒ཁૉ: Translation ranking task 50 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 51

Slide 51 text

•ಉ͡όον಺ͷผͷϖΞΛෛྫʹ͢Δ • in-batch negativesͱݺ͹ΕΔ • ྨࣅ౓ߦྻ͸
 (batch_size x batch_size)
 ͷਖ਼ํߦྻʹͳΔ • (ෛྫΛߋʹ૿΍͢͜ͱ΋Մೳ) •softmaxͰଛࣦ͕ඇରশੑʹ • ղফ͢ΔͨΊɺ2ํ޲(→↓)ͷଛࣦΛ଍͠߹ΘͤΔ LaBSEͷߏ੒ཁૉ: Translation ranking task 51 ࢲ͸ϖϯͰ͢ɻ I am a pen. I’m a cat. Nice to m eet you. Sentence em bedding I’m a perfect hum an. batch_size ਖ਼ྫ batch_size զഐ͸ೣͰ͋Δɻ ͸͡Ί·ͯ͠ɻ จຒΊࠐΈ ࢲ͸׬ᘳͳਓؒͰ͢ɻ

Slide 52

Slide 52 text

•ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ ϕ ϕ′  [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 52

Slide 53

Slide 53 text

•ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′  [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 53

Slide 54

Slide 54 text

•ྨࣅ౓ؔ਺ ΛmarginΛಋೖͨ͠ ʹஔ͖׵͑ • margin͸ਖ਼ྫʹ͚ͩద༻ •ਖ਼ྫ͸ΑΓू·Γɺෛྫ͸ΑΓ཭ΕΔ •มߋޙͷଛࣦؔ਺͸͜͏↓ ϕ ϕ′  [27] Yang+: Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax, IJCAI ‘19 LaBSEͷߏ੒ཁૉ: Additive Margin Softmax (AMS) [27] 54

Slide 55

Slide 55 text

Masked Language Modeling (MLM) •ϚεΫ͞Εͨ෦෼ʹ౰ͯ͸·Δ୯ޠΛ༧ଌ͢Δ͜ͱͰϞσϧΛ܇࿅ • ࣗݾڭࢣ͋ΓֶशͰ൚༻తͳݴޠ஌ࣝΛ֫ಘ •BERT [00]Ͱ͓ೃછΈ LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 55 զഐ ͸ [MASK] Ͱ͋Δ ɻ [CLS] [SEP] BERT ೣ

Slide 56

Slide 56 text

Translation Language Modeling (TLM) [28] •຋༁จϖΞΛ࿈݁ͯ͠MLM • ೋݴޠؒͷରԠ (alignment) ͷֶशΛظ଴ [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 56 զഐ ͸ [MASK] Ͱ͋Δ ɻ [/s] [/s] Transformer ೣ [MASK] am a [/s] I cat

Slide 57

Slide 57 text

•TLM͸MLMͷ֦ு • ଟগͷมߋͰMLMͱಉ͡Α͏ʹֶश͕Ͱ͖Δ •LaBSEͰ͸MLMͱTLM
 Λ૊Έ߹Θֶͤͯश [28] Conneau+: Cross-lingual Language Model Pretraining, NeurIPS ‘19 LaBSEͷߏ੒ཁૉ: MLM and TLM Pre-training 57 ͜ͷݚڀͰ͸ଟݴޠೳྗ (multilinguality)Λ޲্ͤ͞ΔͨΊ
 Language embeddings
 Λ࢖͍ͬͯͳ͍

Slide 58

Slide 58 text

•BiLSTMΛ༻͍ͨseq2seqͷ຋༁λεΫΛղ͖จຒΊࠐΈϞσϧΛ֫ಘ • Encoder͕ݴޠඇґଘͷจදݱΛநग़ • DecoderʹจຒΊࠐΈͱݴޠIDΛิॿతʹೖྗͯ͠จੜ੒ •ݴޠԣஅNLI (XNLI)΍ଟݴޠݕࡧλεΫͰߴ͍ੑೳɺ97ݴޠʹରԠ [29] Artetxe+: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, TACL ‘18 ؔ࿈ݚڀ: LASER [29] 58

Slide 59

Slide 59 text

•ϚϧνλεΫֶशʹΑͬͯߴੑೳͳଟݴޠจຒΊࠐΈϞσϧΛ֫ಘ • Translation ranking taskֶ͕शλεΫʹؚ·Ε͍ͯΔ • ࣗݾڭࢣ͋ΓֶशͳͲʹΑΔࣄલֶश͸(ڪΒ͘)͍ͯ͠ͳ͍ [30] Yang+: Multilingual Universal Sentence Encoder for Semantic Retrieval, ACL: System Demonstrations ‘20 ؔ࿈ݚڀ: m-USE [30] 59

Slide 60

Slide 60 text

Monolingual Data •CommonCrawl, Wikipedia͔Βऩूɺ17B (170ԯ) sentences •લॲཧࡁΈɺࣄલֶश(MLM)ʹͷΈར༻ Bilingual Translation Pairs •Webϖʔδͷ຋༁จϚΠχϯά(bitext mining)Ͱσʔλऩूɺ6B (60ԯ) pairs •σʔλෆۉߧରࡦͱͯ͠ɺ֤ݴޠͷจ਺͕100MҎԼʹͳΔΑ͏੍ݶ •αϒηοτ΁ͷਓखධՁΛ༻͍ͨ௿඼࣭σʔλͷϑΟϧλϦϯά •ࣄલֶश(MLM & TLM)ͱdual-encoderͷ܇࿅ʹར༻ ֶशσʔλ 60

Slide 61

Slide 61 text

•ରরֶशϕʔεͷख๏͸Ұൠʹෛྫͷ਺͕ଟ͍΄Ͳੑೳ͕ߴ͘ͳΔ • ࠓճ͸in-batch negativesΛ༻͍ΔͷͰbatch_size - 1͕ෛྫͷ਺ • batch sizeΛେ͖͘͢Δͱͦͷ෼େྔͷϝϞϦΛ৯͏ •Accelerator͝ͱ(͜ͷݚڀͰ͸TPU)ʹ෼ׂͯ͠ྨࣅ౓ܭࢉ ֶशͷ޻෉ 61

Slide 62

Slide 62 text

United Nations (UN) •ӳޠ͔ΒରԠ͢ΔผݴޠͷจॻΛݕࡧ (Precision@1ʹaccuracy) •en-fr, en-es, en-ru, en-ar, en-zhͷ5ݴޠରɺ86,000จ Tatoeba •ӳޠҎ֎ͷݴޠ͔ΒରԠ͢Δӳ༁Λݕࡧ (Average accuracy) •https://tatoeba.org ͔ΒྫจͱͦΕʹඥͮ͘ର༁Λऩूͨ͠ίʔύε •112ݴޠɺ֤ݴޠʹ͖ͭ1000จͱରԠ͢Δӳ༁͕ଘࡏ •طଘݚڀʹ฿͍ɺ36ݴޠͷΈʹߜͬͨαϒηοτͰͷධՁ΋࣮ࢪ BUCC •୯ݴޠίʔύε͔Β຋༁จϖΞΛݟ͚ͭΔ (Precision, Recall, F1) •fr-en, de-en, ru-en, zh-enͷ4ݴޠର ධՁλεΫ: bitext retrieval 62

Slide 63

Slide 63 text

SentEval •ԼྲྀλεΫʹ͓͚Δ෼ྨੑೳΛධՁ •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏ Semantic Textual Similarity (STS) •ϞσϧʹΑΔྨࣅ౓ͱਓखධՁʹΑΔྨࣅ౓ͱͷ૬ؔΛධՁ •ӳޠͷΈɺଟݴޠϞσϧ͕ͩ୯ݴޠϞσϧͱͯ͠ͷੑೳධՁ΋ߦ͏ •ࣄޙ෼ੳͷઅͰ঺հ ධՁλεΫ: จຒΊࠐΈͷ඼࣭ධՁ 63

Slide 64

Slide 64 text

•ޠኮαΠζ • mBERT Vocab: multilingual BERT (mBERT)ͱಉ͡(119,547) • Customized Vocab: ݴޠ͝ͱͷσʔλෆۉߧରࡦΛͯ͠ॳΊ͔Β࡞੒ (501,153) •ࣄલֶश (PT) • MLM+TLMʹΑΔࣄલֶशΛ΍Δ͔Ͳ͏͔ • ΍Βͳ͍৔߹͸Translation ranking taskͷΈΛߦ͏ •Additive Margin Softmax (AMS) • Translation ranking taskʹmarginΛ࢖͏͔Ͳ͏͔ ࣮ݧ৚݅ 64

Slide 65

Slide 65 text

•จຒΊࠐΈʹ͸ [CLS] ΛL2ਖ਼نԽͯ͠ར༻ •optimizer = AdamW, learning rate = 1e-3, seq length = 128 Pre-training •batch size: 8192 Translation ranking task •batch size: 4096 •w/ Pre-training: 50k steps, w/o Pre-training: 500k steps •margin value: 0.3 ࣮ݧઃఆ 65

Slide 66

Slide 66 text

•LaBSE (Customized Vocab + AMS + PT) ʹSOTAߋ৽ • Yang et al. ͸bilingual modelͰ͋Γɺ֤ݴޠ͝ͱʹϞσϧ͕ඞཁ • LaBSE͸ҰͭͷϞσϧͰ109Ҏ্ͷݴޠʹରԠՄೳ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 66 ͳ͔ͥBase w/ Customized Vocabͷ݁Ռ͕ଘࡏ͠ͳ͍

Slide 67

Slide 67 text

•UNʹ͓͍ͯPre-training (PT)͸྆ํͷޠኮͰ༗ޮ • Translation ranking taskʹΑΔֶश΋଎͘ͳΔ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 67

Slide 68

Slide 68 text

•Tatoebaʹ͓͍ͯPre-training (PT)͸mBERT VocabͰ͸༗ޮͰͳ͍ • mBERT VocabͰ͸[UNK]ʹஔ͖׵ΘΔtoken͕ଟ͘ͳΔͷ͕ݪҼ • ྫ͑͹͋ΔݴޠͰ͸71%͕[UNK]ʹͳΔ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 68

Slide 69

Slide 69 text

•UNͱTatoebaͰੑೳͷ܏޲͕ҟͳΔ • UNͷํ͕େن໛ͳbitext retrieval • Ϟσϧؒͷҧ͍Λৄࡉʹݕग़͢Δʹ͸େن໛ͳϕϯνϚʔΫ͕ඞཁ ࣮ݧ݁Ռ: United Nations (UN) & Tatoeba 69

Slide 70

Slide 70 text

•طଘݚڀͱൺֱͯ͠LaBSE͕΄ͱΜͲͰ࠷ߴੑೳ •ҰͭͷϞσϧͰશͯͷݴޠରʹରԠՄೳ • Yang et al.͸4ͭͷϞσϧ͕ඞཁ ࣮ݧ݁Ռ: BUCC 70

Slide 71

Slide 71 text

•ӳޠ୯ݴޠʹ͓͚ΔԼྲྀλεΫ΁ͷసҠੑೳ͸ͦ͜·Ͱߴ͘ͳ͍ • ӳޠจຒΊࠐΈϞσϧͱಉ͡ఔ౓ʹ͸ߴ͍ • ଟݴޠϞσϧͷੑೳͱͯ͠͸े෼͔ ࣮ݧ݁Ռ: SentEval 71

Slide 72

Slide 72 text

•maringͷ஋ʹΑΔUNͰͷ
 ੑೳมԽΛ؍࡯ • margin͕ੑೳ޲্ʹد༩ •0.3ఔ౓ͰੑೳมԽ͕ऩଋ • ͦ͜·Ͱ͸consistentʹ޲্ • શͯͷϞσϧͰੑೳ͕޲্ ෼ੳ: Additive Margin Softmax 72 marginΛ0.4ΑΓେ͖͍ͯͬͨ࣌͘͠ͷมԽ͕ऩଋͨ͠··ͳͷ͔ؾʹͳΔ͕…

Slide 73

Slide 73 text

•PTʹΑͬͯશମతͳੑೳ޲্ •PT͋ΓͷϞσϧ͸50K stepsͷ
 ܇࿅Ͱطʹੑೳ͕ऩଋ • 50K steps = 200M examples • ର༁σʔλ͕গͳ͍͍ͯ͘ •PT͸ੑೳ޲্ͱऩଋ଎౓޲্
 (=܇࿅ࣄྫ਺࡟ݮ)ʹ໾ཱͭ ෼ੳ: ࣄલֶशͷ༗༻ੑ 73

Slide 74

Slide 74 text

•Tatoebaͷগࢿݯݴޠʹ
 ର͢ΔੑೳΛ෼ੳ • LaBSE͸গσʔλͳݴޠ
 ͕ࠞͬͯ͟΋ੑೳ͕ߴ͍ •en-xxͳݴޠରͷσʔλ͕
 ଘࡏ͠ͳ͍ݴޠʹ͍ͭͯͷ
 TatoebaͰͷੑೳΛ෼ੳ • 3ׂ͕75%௒͑ɺ݁ߏߴ͍ • ݴޠಉ࢜ͷྨࣅ౓ͷଞʹɺ
 େن໛σʔλͰͷ܇࿅ͷ
 ޮՌͩͱߟ͑ΒΕΔ ෼ੳ: গࢿݯݴޠʹର͢Δੑೳ 74

Slide 75

Slide 75 text

•LaBSEͷӳޠSTSʹ͓͚Δ
 ੑೳΛ෼ੳ •NLIͰֶश͍ͯ͠Δm-USE
 ͕ඇৗʹߴ͍ੑೳ • SBERTΑΓ΋ߴ͍ʁ •LaBSE͸௿Ίͷ਺஋ •຋༁จϖΞʹΑΔֶश͸ • ҙຯͷ౳Ձੑͷݕग़ʹ͸༏ΕΔ͕ • ҙຯ͕ͲΕ͘Β͍ҟͳ͍ͬͯΔ͔Λࡉ͔۠͘ผ͢Δ͜ͱ͸Ͱ͖ͳ͍ ෼ੳ: STS 75

Slide 76

Slide 76 text

•CommonCrawl͔Βର༁σʔλΛऩू͢Δ࣮ݧΛ࣮ࢪ •ऩूͨ͠ର༁ίʔύεͰػց຋༁ϞσϧΛ܇࿅ɺੑೳධՁ • ଟݴޠจຒΊࠐΈϞσϧͷԠ༻ɾ༗༻ੑධՁ •ӳޠɺதࠃޠɺυΠπޠͷCommonCrawlΛલॲཧ • ·ͣશͯͷจΛຒΊࠐΈදݱʹ • ͢΂ͯͷඇӳޠͷจʹରͯ͠ɺຒΊࠐΈʹ͓͚Δ࠷ۙ๣ΛରԠ͚ͮ • ྨࣅ౓͕0.6ະຬͷ΋ͷ͸আڈ •WMTͷϕϯνϚʔΫσʔληοτΛ༻͍ͯɺBLEUͰ຋༁ੑೳධՁ ෇Ճతͳ࣮ݧ: Mining Parallel Text from CommonCrawl 76

Slide 77

Slide 77 text

•ਓख࡞੒͞Εͨର༁σʔλΛ༻͍ͨγεςϜͱൺֱ • en-deͷNewsʹ͓͍ͯɺطଘݚڀ͔Β2.8ϙΠϯτͷԼམͷΈ • en-zhͷNewsʹ͓͍ͯ͸΄΅ಉ౳ͷੑೳ •TEDʹ͓͍ͯ΋طଘγεςϜͱಉ౳ఔ౓ͷੑೳ • ػցతʹ࡞੒͞Εͨσʔλͷ܇࿅Ͱಉ౳ੑೳͰ͍͢͝ ෇Ճతͳ࣮ݧ: Mining Parallel Text from CommonCrawl 77

Slide 78

Slide 78 text

•109Ҏ্ͷݴޠʹద༻ՄೳͳଟݴޠจຒΊࠐΈϞσϧLaBSEΛఏҊ •ଟݴޠࣄલֶशˠ຋༁ίʔύεΛ༻͍ͨจຒΊࠐΈͷରরֶश • MLM + Translation Language Modeling → Additive Margin Softmax •छʑͷධՁ࣮ݧ • ݴޠԣஅݕࡧͷੑೳΛେ෯ʹվળ • ಛʹগࢿݯݴޠͰߴ͍ੑೳ • ୯ݴޠSTS/SentEval͸ߴ͘ͳ͍ • ࣄલֶशʹΑͬͯର༁σʔλྔΛ࡟ݮ • AMS͕ੑೳʹେ͖ͳӨڹ •ࣄલֶशࡁΈϞσϧ͕ެ։͞Ε͍ͯΔ ·ͱΊ: Language-agnostic BERT Sentence Embedding 78 https://arxiv.org/abs/2007.01852

Slide 79

Slide 79 text

෇࿥

Slide 80

Slide 80 text

•(ॳग़2020೥ͷݚڀ͚ͩͲ)ͪΐͬͱݹ͍ • ൺֱର৅ͱͯ͠SimCSE΍DistilCSE͕࢖ΘΕ͍ͯͳ͍ •Typo͕ͪΐͬͱଟ͍ʢ ; ; ʣ •ରরֶशΛ༻͍ͨจຒΊࠐΈख๏ͷఆੴ͕࢖ΘΕ͍ͯͳ͍ • จྨࣅ౓ͱͯ͠಺ੵΛ࢖͍ͬͯΔ • Թ౓ύϥϝʔλΛ࢖͍ͬͯͳ͍ • ͱࢥͬͨΒL2 normalization→scaling͍ͯͨ͠ͷͰຊ࣭తʹಉͩͬͨ͡ ͪΐͬͱؾʹͳΔ఺ 80

Slide 81

Slide 81 text

•࠷ۙ͸PearsonͰ͸ͳ͘SpearmanͰධՁ͢Δ͜ͱ͕ଟ͍ • Pearson͸͋Μ·ΓධՁࢦඪͱͯ͠ྑ͘ͳ͍ΑͶͱ͍͏࿩͕͋Δ [31] •SpearmanͰධՁ͢ΔݶΓSTS͸ʮॱҐ෇͚λεΫʯͳ͜ͱʹ஫ҙ •࠷ۙSTS Benchmarkͷdev setͰϋΠύϥௐ੔͢Δͷ͕ྲྀߦ͍ͬͯΔ • 250 step͝ͱʹdevͰධՁͯ͠࠷ߴͷcheckpointͰtestͰධՁ (SimCSE) • STSʹաద߹ͦ͠͏ͳͷͰ͋·Γྑ͍ํ਑ʹ͸ࢥ͑ͳ͍͕… • ʮֶशʹ͸࢖͑ͳ͍͚Ͳdevͱͯ͠͸࢖͑·͢ʂʯ͸ྑ͍ઃఆʁ •STSλεΫ͸ධՁख๏͕࿦จ͝ͱʹҟͳΔ͕࣌͋Γɺ஫ҙ͕ඞཁ • ධՁࢦඪ΍ධՁखॱ͕όϥ͍͍ͭͯΔ͜ͱ͕͋ͬͨ (࠷ۙ͸౷Ұ͞ΕͯΔ) • SimCSE࿦จ [11] ͷAppendix.Bʹهड़͕͋ΔͷͰҰಡΛਪ঑ [31] Reimers+: Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity, COLING ‘16 ಋೖ: STSʹ·ͭΘΔখൌ 81

Slide 82

Slide 82 text

•࣮͸จͷҙຯҎ֎ΛຒΊࠐΜͰ΋͍͍ • ࢖༻୯ޠɺελΠϧ(ϑΥʔϚϧ͞ɺܟޠ)ɺ࣭໰ͱ౴͑ͷۙ͞ɺetc. •จຒΊࠐΈۭؒ͸จͷԿΛ͚ۙͮΔ͔Ͱಛ௃͚ͮΒΕΔ • ͲͷΑ͏ʹڑ཭Λఆٛ͢Δ͔͕จຒΊࠐΈͷੑ࣭ΛܾΊΔ •܇࿅Ͱ͚ۙͮΔจͱͦΕʹΑͬͯදݱ͞ΕΔ“ڑ཭”ͷରԠ (චऀͷ༧ଌ) • ؚҙؔ܎ʹ͋Δจ: จͷද૚తྨࣅ౓ΑΓҙຯʹ஫໨ • ࣭໰ͱճ౴: จࣗମͷҙຯΑΓ࣭໰ͱճ౴͕ද͢಺༰ʹ஫໨ • ຋༁จϖΞ: Ͳͷݴޠ͔Λແࢹͯ͠จͷҙຯʹ஫໨ ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 82

Slide 83

Slide 83 text

[08] Conneau+: Supervised Learning of Universal Sentence Representations from Natural Language Inference Data, EMNLP ’17 [10] Reimers+: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, EMNLP ’19 [11] Gao+: SimCSE: Simple Contrastive Learning of Sentence Embeddings, EMNLP ’21 [32] Hill+: Learning Distributed Representations of Sentences from Unlabelled Data, NAACL ’16 [33] Wang+:, TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning, EMNLP fi ndings ’21 [34] Li+: OPTIMUS: Organizing Sentences via Pre-trained Modeling of a Latent Space, EMNLP ’20 ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 83 จͷҙຯΛຒΊࠐΉख๏ •InferSent: dual-encoder (Siamese) ߏ଄ͰLSTMΛNLI෼ྨͰֶश [08] •Sentence-BERT: dual-encoderߏ଄ͰBERTΛNLI෼ྨͰ fi ne-tuning [10] •Supervised SimCSE: NLIͷؚҙؔ܎ͷจϖΞΛਖ਼ྫͱͨ͠ରরֶश[11] จΛ࠶ߏஙͰ͖ΔΑ͏ʹจͷ৘ใΛຒΊࠐΉख๏ •SDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠LSTMΛֶश [32] •TSDAE: ೖྗจͷϊΠζΛআڈͭͭ͠࠶ߏஙͯ͠TransformerΛֶश [33] •Optimus: Ͱ͔͍VAE [34]

Slide 84

Slide 84 text

[06] Kiros+: Skip-Thought Vectors, NIPS ’15 [09] Cer+: Universal Sentence Encoder, arXiv, Mar 2018 [35] Tsukagoshi+: DefSent: Sentence Embeddings using De fi nition Sentences, ACL ’21 [36] Wu+: DistilCSE: E ff ective Knowledge Distillation For Contrastive Sentence Embeddings, ARR ’22 [37] Wu+: DisCo: E ff ective Knowledge Distillation For Contrastive Learning of Sentence Embeddings, arXiv ’21 ([35]ͱಉ಺༰) ಋೖ: จຒΊࠐΈ͕ຒΊࠐΉ΋ͷ(ൃදऀͷ͓ؾ࣋ͪ) 84 લޙͷจͷ৘ใΛຒΊࠐΉख๏ •Skip-Thought: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [06] •USE: Skip-Thoughtͷڭࢣͳֶ͠श + ෼ྨ໰୊ʹΑΔڭࢣ͋Γֶश [09] ఆٛจ͔Β୯ޠͷҙຯΛߏ੒͢ΔΑ͏ʹจͷҙຯΛຒΊࠐΉख๏ •DefSent: લޙͷจΛ࠶ߏ੒͢ΔΑ͏ʹੜ੒తʹֶश [35] Α͘Θ͔Βͳ͍ख๏ •Unsupervised SimCSE: ҟͳΔdropoutΛద༻ͨ͠จΛਖ਼ྫʹରরֶश [11] •DistilCSE: ڭࢣͱੜెͷจຒΊࠐΈΛਖ਼ྫͱͨ͠ରরֶशʹΑΔৠཹ [36, 37]

Slide 85

Slide 85 text

•୯ݴޠจຒΊࠐΈϞσϧ͔Βͷ஌ࣝৠཹͰଟݴޠจຒΊࠐΈϞσϧΛ܇࿅ • ຋༁ίʔύεΛ࢖ͬͯҟͳΔݴޠͷจຒΊࠐΈΛ௚઀͚ۙͮΔ •ҟݴޠؒSTSͰΑ͍ੑೳ •୯ݴޠSTSͰ΋ੑೳ͕ߴ͍ • NLIΛ࢖͍ͬͯΔͷͰ౰ͨΓલʁ •ݴޠόΠΞε͕LaBSEΑΓখ͍͞ •Teacher modelͷจຒΊࠐΈۭؒͱࣅͨߏ଄Λ࣋ͭଟݴޠจຒΊࠐΈϞσ ϧΛ࡞ΕΔͷ͕ྑ͍ͱ͜Ζ • ҰͭͷϞσϧͰsequentialʹ΍Δͱۭ͕ؒյΕΔ [38] Reimers+: Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation, EMNLP ‘20 ؔ܎͕͋Δ࿦จ:
 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation [38] 85