Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[輪講資料] Text Embeddings by Weakly-Supervised Contrastive Pre-training

[輪講資料] Text Embeddings by Weakly-Supervised Contrastive Pre-training

大規模な弱教師あり対照学習によって訓練された強力なテキスト埋め込みモデルE5について解説した輪講資料です。

元論文: https://arxiv.org/abs/2212.03533

Hayato Tsukagoshi

May 07, 2024
Tweet

More Decks by Hayato Tsukagoshi

Other Decks in Research

Transcript

  1. Text Embeddings by Weakly-Supervised Contrastive Pre-training Graduate School of Informatics,

    Nagoya University, Japan. ൃදऀ: Hayato Tsukagoshi Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei
 https://arxiv.org/abs/2212.03533
  2. •େن໛ͳࣄલରরֶशʹΑΓߏங͞Εͨ
 ςΩετຒΊࠐΈϞσϧE5ΛఏҊ • ൒ߏ଄ԽσʔλͱϑΟϧλϦϯάΛ
 ༻͍ͨऑڭࢣ͋ΓσʔληοτΛߏங • όοναΠζ32000Ͱͷpre-training • hard negativeͱCross-Encoder͔Βͷ


    ஌ࣝৠཹΛ׆༻ͨ͠ fi ne-tuning •ධՁͷ݁ՌछʑͷϕϯνϚʔΫͰ
 ฏۉͯ͠طଘϞσϧΛ্ճΔ ֓ཁ 2 #Layers hidden size #params E5-small 12 384 33M E5-base 12 768 110M E5-large 24 1024 330M
  3. •ۙ೥ͷϞσϧͷଟ͘͸஫ҙػߏ(Attention Mechanism)ʹجͮ͘ TransformerͰߏ੒ •͍Ζ͍Ζͳछྨ͕ଘࡏ ࣗݾճؼܕݴޠϞσϧ (Causal LM) •ࠨ͔Βӈʹ୯ޠΛ༧ଌͯ͠܇࿅ •ྫ: GPT,

    GPT-2, GPT-3, Llama 2, … ݴޠϞσϧ: Language Models 6 ଞʹ΋ݴޠϞσϧʹ͸͞·͟·ͳछྨ͕ଘࡏɻྫ: XLNet, ELECTRA, UL2, … BERTͷ֓ཁਤ
  4. •ۙ೥ͷϞσϧͷଟ͘͸஫ҙػߏ(Attention Mechanism)ʹجͮ͘ TransformerͰߏ੒ •͍Ζ͍Ζͳछྨ͕ଘࡏ ࣗݾճؼܕݴޠϞσϧ (Causal LM) •ࠨ͔Βӈʹ୯ޠΛ༧ଌͯ͠܇࿅ •ྫ: GPT,

    GPT-2, GPT-3, Llama 2, … ϚεΫݴޠϞσϧ (Masked LM) •จதͷҰ෦ΛӅ͢ɾ༧ଌͯ͠܇࿅ •ྫ: BERT, RoBERTa, DeBERTa, … ݴޠϞσϧ: Language Models 7 ଞʹ΋ݴޠϞσϧʹ͸͞·͟·ͳछྨ͕ଘࡏɻྫ: XLNet, ELECTRA, UL2, … BERTͷ֓ཁਤ
  5. •ϕΫτϧྻΛೖྗʹϕΫτϧྻΛग़ྗ͢Δػߏ •ೖྗΛQ (Query), K (Key), V (Value)ʹ෼͚ͯܭࢉ • K, V:

    nݸͷd࣍ݩϕΫτϧ • Q: mݸͷd࣍ݩϕΫτϧ ஫ҙػߏ (Attention Mechanism) 8 ਤ͸ Jaegle et al., Perceiver IO: A General Architecture for Structured Inputs & Outputs, ICLR 2022. ΑΓҾ༻ Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳAttention - શྖҬʹԠ༻͞Ε࠷ߴਫ਼౓Λୟ͖ग़͢஫ҙػߏͷ࢓૊ΈʲσΟʔϓϥʔχϯάͷੈք vol. 24ʳ
  6. •ϕΫτϧྻΛೖྗʹϕΫτϧྻΛग़ྗ͢Δػߏ •ೖྗΛQ (Query), K (Key), V (Value)ʹ෼͚ͯܭࢉ • K, V:

    nݸͷd࣍ݩϕΫτϧ • Q: mݸͷd࣍ݩϕΫτϧ •Qʹର͢ΔVͷॏཁ౓ΛQͱKͷ಺ੵˠSoftmaxͰܭࢉ • Attention Weights: ܭࢉͷ݁ՌಘΒΕΔ(m × n)ߦྻ ஫ҙػߏ (Attention Mechanism) 9 ਤ͸ Jaegle et al., Perceiver IO: A General Architecture for Structured Inputs & Outputs, ICLR 2022. ΑΓҾ༻ Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳAttention - શྖҬʹԠ༻͞Ε࠷ߴਫ਼౓Λୟ͖ग़͢஫ҙػߏͷ࢓૊ΈʲσΟʔϓϥʔχϯάͷੈք vol. 24ʳ
  7. •ϕΫτϧྻΛೖྗʹϕΫτϧྻΛग़ྗ͢Δػߏ •ೖྗΛQ (Query), K (Key), V (Value)ʹ෼͚ͯܭࢉ • K, V:

    nݸͷd࣍ݩϕΫτϧ • Q: mݸͷd࣍ݩϕΫτϧ •Qʹର͢ΔVͷॏཁ౓ΛQͱKͷ಺ੵˠSoftmaxͰܭࢉ • Attention Weights: ܭࢉͷ݁ՌಘΒΕΔ(m × n)ߦྻ •Self-Attention (ࣗݾ஫ҙػߏ): Q, K, VΛಉ͡ϕΫτϧྻ͔Βߏ੒ (i.e. n=m) •Cross-Attention: ʮQʯͱʮK, VʯΛҟͳΔϕΫτϧྻ͔Βߏ੒ ஫ҙػߏ (Attention Mechanism) 10 ਤ͸ Jaegle et al., Perceiver IO: A General Architecture for Structured Inputs & Outputs, ICLR 2022. ΑΓҾ༻ Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳAttention - શྖҬʹԠ༻͞Ε࠷ߴਫ਼౓Λୟ͖ग़͢஫ҙػߏͷ࢓૊ΈʲσΟʔϓϥʔχϯάͷੈք vol. 24ʳ
  8. •஫ҙػߏͷΈͰߏ੒͞ΕͨϞσϧߏ଄ • ͦΕ·ͰNLPͰΑ͘ར༻͞Ε͍ͯͨ
 RNN, LSTM΍CNNΛഉআ • Transformer 11 Vaswani etl

    al., Attention Is All You Need, NeurIPS 2017. Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳTransformer - Multi-Head AttentionΛཧղͯ͠΍Ζ͏͡Όͳ͍ͷʲσΟʔϓϥʔχϯάͷੈքvol.28ʳ ֓ཁਤ Encoder Decoder
  9. •஫ҙػߏͷΈͰߏ੒͞ΕͨϞσϧߏ଄ • ͦΕ·ͰNLPͰΑ͘ར༻͞Ε͍ͯͨ
 RNN, LSTM΍CNNΛഉআ •ϕΫτϧྻΛೖྗʹϕΫτϧྻΛग़ྗ͢Δػߏ • ೖྗϕΫτϧಉ࢜ͷ૬ޓ࡞༻Λߟྀ Transformer 12

    Vaswani etl al., Attention Is All You Need, NeurIPS 2017. Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳTransformer - Multi-Head AttentionΛཧղͯ͠΍Ζ͏͡Όͳ͍ͷʲσΟʔϓϥʔχϯάͷੈքvol.28ʳ ֓ཁਤ Encoder Decoder
  10. •஫ҙػߏͷΈͰߏ੒͞ΕͨϞσϧߏ଄ • ͦΕ·ͰNLPͰΑ͘ར༻͞Ε͍ͯͨ
 RNN, LSTM΍CNNΛഉআ •ϕΫτϧྻΛೖྗʹϕΫτϧྻΛग़ྗ͢Δػߏ • ೖྗϕΫτϧಉ࢜ͷ૬ޓ࡞༻Λߟྀ •EncoderͱDecoderͷೋछྨ͕ଘࡏ •

    EncoderͷΈ: BERT, LUKE, … • DecoderͷΈ: GPT, GPT-2, GPT-3, … • Encoder-Decoder: BART, T5, UL2, … Transformer 13 Vaswani etl al., Attention Is All You Need, NeurIPS 2017. Θ͔Γ΍͍͢ղઆ: ʲਂ૚ֶशʳTransformer - Multi-Head AttentionΛཧղͯ͠΍Ζ͏͡Όͳ͍ͷʲσΟʔϓϥʔχϯάͷੈքvol.28ʳ ֓ཁਤ Encoder Decoder
  11. •Transformer EncoderΛෳ਺૚ॏͶͯେن໛ʹࣄલֶशͨ͠Ϟσϧ • base͸12૚ (1ԯύϥϝʔλ)ɺlarge͸24૚ (3.3ԯύϥϝʔλ) •ࣄલֶश (pre-training) → ඍௐ੔

    ( fi ne-tuning) ͱ͍͏ύϥμΠϜ͕ීٴ BERT: Bidirectional Encoder Representations from Transformers 14 Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, NAACL 2019.
  12. •ࣗવݴޠจͷີϕΫτϧදݱ •ϕΫτϧͷڑ཭͕จͷҙຯͷۙ͞Λදݱ จຒΊࠐΈ: Sentence Embedding 16 ͜Ͳ΋͕Ոʹ޲͔͍ͬͯΔɻ ͜Ͳ΋ֶ͕ߍ͔ΒՈʹ޲͔͍ͬͯΔɻ ͜Ͳ΋͕ਤॻؗʹ͍Δɻ ͜Ͳ΋͕ޕޙʹา͍͍ͯΔɻ

    จຒΊࠐΈۭؒ [0.1, 0.2, ...] [0.1, 0.3, ...] [0.9, 0.8, ...] [0.5, 0.7, ...] ҙຯతʹྨࣅ ͍ۙҙຯΛ࣋ͭจ͸ ۙ͘ʹ෼෍ ϕΫτϧؒͷڑ཭͕
 ҙຯతͳؔ܎Λදݱ
  13. ॳظ (~2018) •੩త୯ޠຒΊࠐΈ(Word2Vec, GloVe)͔ΒจຒΊࠐΈΛߏ੒͢Δख๏͕ओྲྀ • SIF, uSIF, All-but-the-Top, … •LSTM౳Λར༻ͯ͠from

    scratchʹֶश͢Δख๏΋͍͔ͭ͘ଘࡏ • SkipThought, InferSent, Universal Sentence Encoder (USE), … ࣄલֶशϞσϧඍௐ੔ख๏ͷ୆಄ (2019~2021) •BERTͷ fi ne-tuningʹΑΓจຒΊࠐΈϞσϧΛಘΔख๏͕૿Ճ • BERT- fl ow, Sentence-BERT (SBERT), … จຒΊࠐΈݚڀͷมભ 17 ஫ҙ: ೥୅͸ͬ͘͟ΓͰ͢
  14. ࣗવݴޠਪ࿦ (Natural Language Inference; NLI) •จϖΞ (લఏจɾԾઆจ) ʹϥϕϧ (ؚҙɾໃ६ɾதཱ) ͕෇༩

    •จϖΞͷҙຯؔ܎Λ༧ଌ͢ΔλεΫ NLIσʔληοτ 18 લఏจ Ծઆจ ϥϕϧ A man playing an electric guitar on stage. A man playing guitar on stage. ؚҙ A man playing an electric guitar on stage. A man playing banjo on the fl oor. ໃ६ A man playing an electric guitar on stage. A man is performing for cash. தཱ
  15. ରরֶशོ੝ظ (2021~) •ը૾෼໺Ͱྲྀߦ͍ͯͨ͠ରরֶशख๏͕จຒΊࠐΈʹ΋ •ಛʹ SimCSE ͕୅දతͳख๏ʹ • ڭࢣ͋Γɾڭࢣͳ͠ͷೋͭͷख๏ΛఏҊ Unsupervised SimCSE

    1. ಉ͡ೖྗʹର͠ҟͳΔdropout maskͰforward 2. ಘΒΕͨʮಉ͡ೖྗʹର͢ΔҟͳΔग़ྗʯಉ࢜Λਖ਼ྫʹରরֶश Supervised SimCSE • NLIσʔληοτதͷʮؚҙʯؔ܎ʹ͋ΔจϖΞΛਖ਼ྫʹରরֶश จຒΊࠐΈݚڀͷมભ 19 ೔ຊޠSimCSEͷςΫχΧϧϨϙʔτ͸ͪ͜Β
  16. •දݱֶश (representation learning) ͷख๏ͷҰͭ •ਖ਼ྫಉ͕࢜ۙͮ͘Α͏ʹɺ͔ͭɺෛྫಉ͕࢜཭ΕΔΑ͏ʹֶश͢Δ • ਖ਼ྫಉ࢜ͷྨࣅ౓࠷େԽ & ෛྫಉ࢜ͷྨࣅ౓࠷খԽ ଛࣦ

    (InfoNCE) ͷܭࢉ •ਖ਼ྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ෛྫಉ࢜ͷຒΊࠐΈදݱͷcosྨࣅ౓ΛٻΊΔ •ྨࣅ౓Λฒ΂ͯԹ౓ύϥϝʔλΛద༻͢Δ •Softmaxؔ਺Λద༻ͯ֬͠཰෼෍ͱΈͳ͢ •ਖ਼ྫʹ͚ͩ1ཱ͕ͭ෼෍ʹ͚ۙͮΔ ରরֶश 21
  17. εέʔϦϯάظ (2022~) •܇࿅ͷେن໛Խ͕ੵۃతʹߦΘΕΔΑ͏ʹ • σʔλྔͱόοναΠζͷ૿େʹΑΔରরֶशͷεέʔϦϯά • ϞσϧύϥϝʔλͷεέʔϦϯά •multi-stage contrastive learningͷಋೖ

    • ऑڭࢣσʔλΛ༻͍ͨࣄલֶशˠڭࢣ͋ΓֶशʹΑΔFine-tuning • Ϟσϧྫ: E5, GTE, BGE, … •େن໛ݴޠϞσϧ(LLM)Λ༻͍ͨςΩετຒΊࠐΈͷݚڀ΋ൃలத • PromptEOL, E5-Mistral, LLM2Vec, … จຒΊࠐΈݚڀͷมભ 23 ೔ຊޠSimCSEͷςΫχΧϧϨϙʔτ͸ͪ͜Β
  18. E5

  19. •େن໛ͳࣄલରরֶशʹΑΓߏங͞Εͨ
 ςΩετຒΊࠐΈϞσϧE5ΛఏҊ • ൒ߏ଄ԽσʔλͱϑΟϧλϦϯάΛ
 ༻͍ͨऑڭࢣ͋ΓσʔληοτΛߏங • όοναΠζ32000Ͱͷpre-training • hard negativeͱCross-Encoder͔Βͷ


    ஌ࣝৠཹΛ׆༻ͨ͠ fi ne-tuning •ධՁͷ݁ՌछʑͷϕϯνϚʔΫͰ
 ฏۉͯ͠طଘϞσϧΛ্ճΔ ֓ཁ: ࠶ܝ 25 #Layers hidden size #params E5-small 12 384 33M E5-base 12 768 110M E5-large 24 1024 330M
  20. •ਂ૚ֶशϞσϧͷ܇࿅Ͱ͸σʔλͷ඼࣭ͱଟ༷ੑ͕ੑೳΛେ͖͘ࠨӈ •͔͠͠ςΩετຒΊࠐΈϞσϧֶशͷͨΊͷσʔληοτ͸গ਺ • طଘݚڀ͸ Stanford NLI ΍ MS-MARCO ͳͲਓखখن໛σʔλΛར༻ •େن໛ͳςΩετຒΊࠐΈϞσϧ܇࿅༻σʔληοτΛߏங͢Δ

    • ൒ߏ଄Խ͞Εͨσʔλ͔ΒςΩετϖΞΛऩू (ϑΟϧλલ: 1.3B pairs) CCPairs: ࣄલରরֶशͷͨΊͷେن໛σʔληοτ 27 Source Query Passage Size Wikipedia entity + section title passage 24M Reddit post upvoted comment 60M Common Crawl title passage 69M ࠷ऴతʹऩू͞Εͨσʔλͱܗࣜͷྫ
  21. •σʔλͷ඼࣭޲্ɾ܇࿅ίετ࡟ݮͷͨΊϑΟϧλϦϯάΛ࣮ࢪ • ࠷ऴతʹ270M·Ͱ࡟ݮ •ਂ૚ֶशϞσϧͷʮnoisyͳσʔλதͷ៉ྷͳࣄྫ͔Β֮͑ΔʯڍಈΛར༻ Consistency-based data fi ltering 1. 1.3BͷnoisyͳσʔληοτͰϞσϧΛ܇࿅

    2. 1MͷจষΛϥϯμϜʹநग़ͯ͠༻ҙ 3. ͋ΔΫΤϦʹର͠ਖ਼ྫจষͱϥϯμϜநग़͞Ε֤ͨจষͱͷྨࣅ౓Λ1ͷ ϞσϧΛ࢖ͬͯܭࢉ 4. ਖ਼ྫจষͷྨࣅ౓ॱҐ͕2Ҏ্ͷࣄྫͷΈ࢒͢ CCPairs: Consistency fi lterʹΑΔϊΠζআڈ 29 ޙଓͷGTEͰ͸ fi ltering͸͞Ε͍ͯͳ͍
  22. •2ஈ֊ͷֶशख๏Λ࠾༻ 1. Contrastive Pre-training •௨ৗͷରরֶशଛࣦͱCCPairsΛ༻͍ͯڊେόοναΠζͰ܇࿅ • σʔλ͕noisyͳ৔߹΄ͲόοναΠζ͸େ͖ͨ͘͠΄͏͕ྑͦ͞͏ •ೖྗʹଐੑ৘ใΛද͢pre fi xΛ෇Ճ

    2. Fine-tuning •ਓखͰ࡞੒͞Εͨϥϕϧ෇͖σʔληοτͰ fi ne-tuning • ରরֶशଛࣦͷଞʹ஌ࣝৠཹଛࣦ΋༻͍Δ ֶशख๏ 30 “query:” ͱ “passage:” ͷೋͭ
  23. •ΫΤϦɾจষͷຒΊࠐΈಉ࢜Ͱྨࣅ౓Λܭࢉ • ΫΤϦ—ෛྫͷྨࣅ౓ΑΓΫΤϦ—ਖ਼ྫͷྨࣅ౓͕ߴ͘ͳΔΑ͏ֶश • ҙ༁: ΫΤϦͱจষͱͷྨࣅ౓ߦྻʹ͓͚Δର֯੒෼ͷ࠷େԽ •ಉ͡όον಺ͷଞͷࣄྫΛෛྫʹ͢Δ: in-batch negatives ֶशख๏:

    ରরֶश / Contrastive Pre-training 33 ΫΤϦ ਖ਼ྫจষ Model Model ਖ਼ྫͷຒΊࠐΈΛ͚ۙͮΔ batch size batch size ΫΤϦͱจষΛผʑʹ encode͢ΔͷͰDual- Encoderͱݺ͹ΕΔ ॏΈ͸ڞ༗
 (ಉ͡Ϟσϧ) యܕతʹ͸cosྨࣅ౓
  24. •ϥϕϧ෇͖σʔλͰͷ fi ne-tuningͰ͸σʔλͷ඼࣭͕ॏཁ 1. hard negativesͷར༻ •ͺͬͱݟ͸Θ͔Βͳ͍೉͍͠ࣄྫ • ϞσϧͷදݱྗΛߴΊΔɺඍࡉͳ৘ใΛଊ͑ΒΕΔΑ͏ʹ͢ΔޮՌ •MS-MARCO΍Natural

    Questions (NQ)Ͱ͸ෛྫΛminingͯ͠ར༻ 2. ରরֶशͱ஌ࣝৠཹͷ૊Έ߹Θͤ •ڭࢣ৴߸ΛΑΓϦονʹ͢ΔͨΊCross-Encoderͷग़ྗΛڭࢣͱͯ͠ར༻ •ରরֶशଛࣦͱ஌ࣝৠཹଛࣦΛ૊Έ߹ΘͤͨϚϧνλεΫֶश ֶशख๏: Fine-tuning 35 ஌ࣝৠཹଛࣦ ରরֶशଛࣦ
  25. Summary: E5ͷ࡞Γํ 41 Un fi ltered
 Corpus Consistency-based
 Filtering CCPairs

    Masked LM E5-PT Contrastive
 Pre-training E5 Contrastive
 Fine-tuning Labeled Data Knowledge Distillation Reranker encoder-only
  26. Summary: E5ͷ࡞Γํ 42 rerankerͷ࡞੒खॱ͸ݪஶ࿦จʹ͸શવॻ͍͍ͯͳ͍ Un fi ltered
 Corpus Consistency-based
 Filtering

    CCPairs Masked LM E5-PT Contrastive
 Pre-training E5 Contrastive
 Fine-tuning Labeled Data Knowledge Distillation Reranker Retriever 1 Retriever 2 ৄࡉ͸ઌߦݚڀͷSimLM࿦จΛࢀরͷ͜ͱ encoder-only ৭ʑΊͪΌͪ͘Όؤு͍ͬͯΔ
  27. •Poolingख๏: Average Pooling (ग़ྗຒΊࠐΈͷฏۉΛऔΔ) • Transformerͷग़ྗ͸ϕΫτϧྻɺ୯ҰϕΫτϧʹ͢ΔͨΊͷૢ࡞͕Pooling Ϟσϧઃఆɾ܇࿅ৄࡉ 43 E5-large pre-training

    fi ne-tuning #GPUs (V100) 64 8 batch size 32000 256 max length 128 192 #iteration 20000 steps 3 epochs Թ౓ύϥϝʔλ (τ) 0.01 0.01 ଛࣦͷॏΈ (α) N/A 0.2 #hard negatives N/A 7 Dataset CCPairs MS-MARCO, NQ, NLI
  28. •Poolingख๏: Average Pooling (ग़ྗຒΊࠐΈͷฏۉΛऔΔ) • Transformerͷग़ྗ͸ϕΫτϧྻɺ୯ҰϕΫτϧʹ͢ΔͨΊͷૢ࡞͕Pooling Ϟσϧઃఆɾ܇࿅ৄࡉ 44 E5-large pre-training

    fi ne-tuning #GPUs (V100) 64 8 batch size 32000 256 max length 128 192 #iteration 20000 steps 3 epochs Թ౓ύϥϝʔλ (τ) 0.01 0.01 ଛࣦͷॏΈ (α) N/A 0.2 #hard negatives N/A 7 Dataset CCPairs MS-MARCO, NQ, NLI SimCSEͷ0.05ΑΓখ͍͞ Թ౓ύϥϝʔλ͕খ͍͞
 →ͦ͜·Ͱྨࣅ౓෼෍Λ
 ઑΒͤͳͯ͘΋͍͍ ଟ༷ͳσʔλΛֶश
 ͢Δނͷ഑ྀʁ
 (ແཧʹྨࣅ౓ΛߴΊΑ͏
 ͱ͠ͳ͍Α͏ʹ)
  29. •ରরֶशͰ࠷΋ॏཁͳϋΠύϥͷҰͭ •Softmaxલͷ஋ΛՃ޻ͯ͠Softmaxޙͷ
 ෼෍ͷܗঢ়ΛมԽͤ͞Δ Թ౓ύϥϝʔλͷิ଍: Πϝʔδ 46 ༧ଌ෼෍ ߴԹ౓ύϥϝʔλ
 ྫ: 10

    ௿Թ౓ύϥϝʔλ
 ྫ: 0.01 ෼෍͕ฏୱʹ ෼෍͕ٸफ़ʹ Ϟσϧ͕ؤுͬͯ෼෍Λ
 ઑΒͤΔඞཁ͋Γ Ϟσϧ͕ؤுΒͳͯ͘΋
 ෼෍͕ઑΔ
  30. •SimCSE΍Contrieverͱ͍ͬͨطଘख๏Λ্ճΔੑೳ • E5͸͜ΕΒͷख๏ΑΓσʔληοτ࡞੒
 Λؤு͍ͬͯΔ •Contrastive Pre-trainingͷΈ΋͔ͳΓڧ͍ • ςΩετຒΊࠐΈͷͨΊͷࣄલֶश͕ޮՌత •E5ͷ fi

    ne-tuningͷσʔληοτ͸ݶఆత • طଘख๏ʹෛ͚ͯΔλεΫ΋ׂͱ͋Δ • ଟ༷Խɾେن໛Խ͢Δ͜ͱͰੑೳ޲্Λ
 ໨ࢦͤͦ͏ ࣮ݧ݁Ռ: BEIR🍺 48 ද͸ॾʑলུͨ݁͠Ռɺৄࡉͳ࣮ݧ݁Ռ͸ݪஶ࿦จΛࢀরͷ͜ͱ Avg. BM25 41.7 SimCSEbase 20.3 Contrieverunsup 36.0 E5-PTlarge 44.2 Contrieversup 46.6 ColBERT 44.4 E5large 50.0 ڭࢣͳ͠ ڭࢣ͋Γ
  31. • fi ne-tuningʹ࢖͏σʔληοτΛม͑ͯMTEBͰੑೳධՁ •Contrastive Pre-training͚ͩΑΓ fi ne-tuningͨ͠ํ͕ฏۉੑೳ͸ߴ͍ • ͕ɺNLIσʔληοτ͚ͩͰ fi

    ne-tuning͢Δͱݕࡧੑೳ͸Ή͠Ζ௿Լ • ݕࡧ+QAͰੑೳ͕͔ͳΓ޲্͢Δ͕ɺSTSͷੑೳ࠷େԽʹ͸NLI͕ඞཁ •શͯΛࠞͥͯ࢖͏͜ͱͰฏۉͯ͠࠷ߴੑೳɺଟ༷Խ͕େࣄ ෼ੳ: fi ne-tuningσʔληοτͷଟ༷ੑ 51 NLIͰtuning͞ΕͨจຒΊࠐΈϞσϧ͸ݕࡧ༻్ʹ͸޲͔ͳ͍Ͱ͢(ࢲݟ)