Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Poly-encoders: Architectures and Pre-training S...

Scatter Lab Inc.
February 27, 2020

Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence scoring

Scatter Lab Inc.

February 27, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. • Pretrained Transformerܳ ੉ਊೞৈ द௫झр੄ Pairwise ো࢑ਸ ೡ ٸ ࢎਊೞח

    ߑߨۿ਷ ௼ѱ 2о૑৓਺ • Cross-encoder • ف द௫झܳ ೞա੄ Encoderী زदী ੑ۱ೞৈ द௫झ р੄ full self-attentionਸ ࣻ೯ೞח ߑߨ • ੌ߈੸ਵ۽ ࢿמ਷ જਵա, पࢎਊೞӝূ ցޖ וܿ • Bi-encoder • ف द௫झܳ ߹ب۽ ੋ௏٬ೞҊ ف Representation ࢎ੉੄ झ௏যܳ ҅࢑ೞח ߑߨ • ੌ߈੸ਵ۽ ࢿמ਷ ؊ ծਵա, पࢎਊী ਬܻ • ࠄ ֤ޙীࢲח Cross-encoderࠁ׮ पࢎਊী ਬܻೞҊ 
 Bi-encoderࠁ׮ ؊ ࢿמ੉ જ਷ Poly-Encoder ߑधਸ ઁউ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring ѐਃ
  2. • Transformers • Pretrained BERT (base) (by Devlin et al)

    • ੷੗о ૒੽ Pretrainೠ 2ѐ੄ BERT • Pretrained BERT৬ زੌೠ Objectives + زੌೠ ؘ੉ఠࣇਵ۽ ೟णೠ BERT • Pretrained BERT৬ زੌೠ Objectives + Redditਵ۽ ೟णೠ BERT • ؀न, ೞ੉ಌ౵ۄ޷ఠ ١੄ ੗ੜೠ ࢸ੿਷ XLM (Lample & Conneau, 2019)੄ ࢸ੿ਸ ٮܴ • ࢎ੹ ೟णद INPUT (അ੤ ޙ੢)җ LABEL (׮਺ ޙ੢)ਸ [S]ۄח ౠࣻ ష௾ਵ۽ хस਺ • REDDITਵ۽ ࢎ੹೟णೡ ٸ, Next Sentence Prediction కझ௼੄ Next Sentenceח 
 ৈ۞ ޙ੢ਵ۽ ੉ܖয૕ ࣻ ੓਺ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring ӝࠄ ࢸ੿
  3. • Context৬ Reply Candidateܳ ߹ب੄ BERT۽ ੋ௏٬ • زੌೠ о઺஖۽

    द੘೧ࢲ
 ೟णೞח زউ ف BERTח ࢲ۽ ׮ܰѱ সؘ੉౟ؽ • Reduction: BERT੄ द௫झ ইਓುਸ Reductionೞח ߑߨ 1. ୐ ష௾ ([S])ਸ ੉ਊ 2. ష௾߹ ইਓುਸ ಣӐ 3. ୐ ష௾ࠗఠ mѐө૑੄ ష௾ਸ ಣӐ • प೷ Ѿҗ ୐ ష௾݅ ੉ਊೞחѱ ઁੌ ࢿמ੉ જও਺ • Score: ف ੋ௏؊੄ ইਓುਸ dot-productೠ чਸ झ௏য۽ ࢎਊ • ೟णदীח زੌೠ ߓ஖ ղ੄ ׮ܲ Reply Candidatesܳ Negatives۽ ੉ਊ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Bi-Encoder
  4. • Context৬ Reply Candidateܳ [S]۽ Concatೞৈ BERTী ੑ۱ • Transformer੄

    п ۨ੉য݃׮ Context৬ Reply੄ 
 Token-level Attention੉ оמೞ޲۽ ࢿמ࢚ ਬܻ • Score • • Reply Candidatesܳ ޷ܻ ো࢑ೡ ࣻ হ׮ח ױ੼ • IR ࠙ঠ ١ীࢲ ࢎਊೞӝ য۰਍ ݽ؛ ҳઑ Score(C, R) = yc,r W Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Cross-Encoder
  5. • Replyח Pooled Outputਵ۽ ੋ௏٬ೞҊ
 Contextח Sequence Outputਵ۽ ੋ௏٬ •

    ੌ߈੸ਵ۽ Contextח Replyࠁ׮ ഻ঁ ӡ૑݅
 Bi-Encoderীࢲח Contextܳ ೞա੄ ߭ఠ۽ ܻ؋࣌ೞӝ
 ٸޙী ੿ࠁ ࣚप੉ ௼ѱ ߊࢤೣ. • Contextܳ ӡ੉ m੄ ߭ఠ۽ ಴അೞݶ ੿ࠁ ࣚप੉ ੸ਸ Ѫ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Poly-Encoder (1/2)
  6. • Attention ো࢑ 1. mѐ੄ Context Codes( )ܳ ੿੄
 where

    • ૊, Context Codesо Query, Context੄ Sequence Output੉ Key, Valueੋ Attention ো࢑ 2. Reply੄ pooled outputҗ 1ߣ ো࢑ਵ۽ࠗఠ ঳য૓ ӡ੉ m੄ Context ߭ఠ৬ ো࢑
 where • ૊, Reply ߭ఠо Query, Context ߭ఠо Key, Valueੋ Attention ো࢑ c1 , . . . , cm yi ctxt = ∑ j wci j hj (wci 1 , . . . , wci N ) = softmax(ci h1 , . . . , . ci hN ) yctxt = ∑ i wi yi ctxt (w1 , . . . , wm ) = softmax(ycandi y1 ctxt , . . . , . ycandi ym ctxt ) Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Poly-Encoder (2/2)
  7. • Next Utterance Retrieval ݽ؛੉ ҳೞҊ੗ ೞח ч: • ױࣽ

    Binary Classification੉ۄ о੿ೞݶ: where • ࠄ ֤ޙীࢲ ࢎਊೠ ੹ۚ
 (ױ, N਷ ೟ण ߓ஖ ࢎ੉ૉ) • ח द௫झ р੄ ੼ࣻܳ ੧ ࣻ ੓ח ੐੄੄ Metric (֤ޙীࢲח Dot Product) P(R|C) P(R|C) = P(L = 1|R, C) L ∈ {0,1} P(R|C) = P(R, C) ∑ k P(Rk , C) ≈ eS(R,C) ∑N k eS(Rk ,C) S(R, C) Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring ೟ण ੹ۚ
  8. • Dialogue ࠙ঠ ߂ Article Search ࠙ঠীࢲ प೷ਸ ࣻ೯ •

    NeurIPS ConvAI2 • Facebook੄ Persona-Chat Dataset • DSTC7 Track1 • Ubuntu chat logs • Ubuntu V2 corpus • DSTC7 Track1ীࢲ ઁҕغח ؘ੉ఠࣇࠁ׮ ખ ؊ ௾ ࢎ੉ૉ੄ ؘ੉ఠ • Wikipedia Article Search • ৔য ਤఃೖ٣ই ؒ೐ীࢲ Ѩ࢝ ௪ܻ৬ ҙ۲ػ ӝࢎ ଺ӝ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Tasks
  9. • ೟ण ߓ஖ ࢎ੉ૉী ٮܲ ࢿמ ߸ച Poly-encoders: Architectures and

    Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Experiments (Bi- or Cross-Encoder) • Fine-tuningद Fine-tuningೞח ౵ۄ޷ఠী ٮܲ ࢿמ ߸ച
  10. • ࢸ੿ • Optimizer۽ח Adamax Optimizer • Bi-, Cross-Encoder৬ח ׳ܻ

    ݽٚ ۨ੉যܳ ೟ण • BERT੄ last linear Layerܳ re-scale (ইਓು чী ౠ੿ чਸ ғೣ) • ੉ѱ ҃೷࢚ ೟णী ೙ࣻ੸੉঻਺ • Context codes੄ ѐࣻ mਸ ߄Լоݶࢲ प೷ೣ • ӝఋ ࢸ੿਷ Bi-Encoder৬ زੌ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Experiments (Poly-Encoder)
  11. • ৘ஏ ࢿמ (ࢿמ ಴ח ֤ޙ ଵҊ) • Cross-Encoder>Poly-Encoder>Bi-Encoder •

    Poly-Encoderח Codes ѐࣻܳ טܾࣻ۾ ࢿמ੉ ೱ࢚ؽ • Pretrained-BERTী ٮܲ ࢿמ ߸ച • Our BERT Pretrained on Reddit > Our BERT pertained on Toronto Books+Wiki > Pretrained-BERT (Devlin et al., 2019) • Pretrained-BERT(Devlin et al., 2019)ܳ ੉ਊ೮ਸ ٸب ӝઓ੄ SOTAח оߺѱ ੉ӣ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Experiments (Poly-Encoder)
  12. • Inference Time Poly-encoders: Architectures and Pre-training Strategies for Fast

    and Accurate Multi-sentence Scoring Experiments (Poly-Encoder)
  13. • BERTܳ ੉ਊೞৈ candidate selection tasks ಽ ٸ੄ ݽ؛ ҳઑ

    ߂ ࢎ੹ ೟ण ੹ۚਸ ઁউ • Poly-Encoder • Context Representationsী Attendೞب۾ ೞৈ ࢿמਸ ֫੉ݶࢲ, ӝઓ੄ pre-calculateೞח ҳઑܳ ਬ૑ೞৈ ࡅܲ ৘ஏਸ оמೞѱ ೣ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Ѿۿ
  14. • Training Time Poly-encoders: Architectures and Pre-training Strategies for Fast

    and Accurate Multi-sentence Scoring Appendix (1/2) • Reduction Layer in Bi-Encoder
  15. • Context Vectorsী ٮܲ ࢿמ ߸ച • Code Vectors৬੄ Attention

    ো࢑द BERT੄ Sequence Outputਸ ݽف ੉ਊ • BERT੄ Sequence Output ઺ ୐ mѐ݅ਸ ੉ਊ • BERT੄ Sequence Output ઺ ݃૑݄ mѐ݅ਸ ੉ਊ • BERT੄ Sequence Output ઺ ݃૑݄ mѐ৬ ୐ ష௾(<CLS>)݅ਸ ੉ਊ Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring Appendix (2/2)