Sparse, Dense, and Attentional Representations for Text Retrieval

Sparse, Dense, and Attentional Representations for Text Retrieval

A42dd3541cd40296dcd8a5e6b4a01bef?s=128

Scatter Lab Inc.

August 28, 2020
Tweet

Transcript

  1. 4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU

  2. ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •

    Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
  3. *OUSPEVDUJPO

  4. 6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • ઱য૓ ௪ܻী ؀೧ࢲ ҙ۲ ੓ח ޙױਸ যڌѱ

    Ҏۄյ ࣻ ੓ਸө? • TF-IDF ١੄ Sparse Modelਸ ഝਊೞৈ 1ର੸ੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క਑ -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ੿׹ਸ ୶୹ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
  5. #J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:

    ௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ׮ܲ ੋ௏؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅࢑ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
  6. &GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈੸ਵ۽ Denseೠ ݽ؛੉ Sparseೠ ݽ؛ࠁ׮ ࢿמ੉ ֫਷ؘ…

    • ӟ ޙ੢ী ؀೧ࢲח ߈٘द Ӓۧ૑ ঋਸ ࣻب ੓ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ੉ ੘ই ޙ੢੄ ੄޷ܳ ࣻਊೞח מ۱(Capacity)੉ ࠗ઒೧ࢲ? • ޙ੢ਸ ੌ߈ചೞח מ۱(Generality)о ࠗ઒೧ࢲ?
  7. &GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ੄ ೨ब • Sparse Model੄ ࢿמਸ

    ࢚ഥೞ۰ݶ Dense Model੄ ରਗ੄ ௼ӝо ழঠ ೠ׮ • ೙ਃೠ ରਗ੄ ௼ӝח ޙࢲ੄ ӡ੉৬ যൃ੄ ं੗ী ੄೧ Ѿ੿ػ׮ • ই਎۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection਷ ੘਷ ରਗীࢲب ࢚׼൤ ਋ࣻೠ ࢿמਸ ࠁ੉ݴ, • Attention Model੄ ҃਋ীח ೙ਃ ରਗ੄ ௼ӝח ੘ਵա ҅࢑۝੉ ݆੉ ೙ਃೞ׮
  8. "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM

  9. • ੐੄੄ ௪ܻ৬ ޙࢲী ؀ೠ 1-hot ಴അ q,d৬ ੉ܳ ਤೠ

    Encoder ೣࣻ fо ੓׮Ҋ о੿ • ਬࢎب ӝ߈ਵ۽ ੼ࣻܳ ݫӟ׮Ҋ о੿: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R     j G R &ODPEFS -45. #&35 j tਘ౟u tੌۄ੉যझu t٣ૉפu uחu tu E     j G E &ODPEFS -45. #&35 j R E G R G E 
  10. 3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ੐੄੄ d1, d2ী ؀೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder

    ೣࣻ fח ׮਺ਸ ݅઒ೣ • ੐੄੄ d1, d2ী ؀೧ࢲ “ε-੿ഛೠ” Encoder ೣࣻח ׮਺ਸ ݅઒ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
  11. 3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ௏؊о যൃ ࢿ࠙ ೤ ௼ӝ੄ ߈࠺۹ೞח

    য়ରਯਸ о૑ݶ Ӓ ੋ௏؊ח ࣽਤܳ ࠁ੹ೠ׮.
  12. 3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ௏؊о যൃ ࢿ࠙ ೤ ௼ӝ੄ ߈࠺۹ೞח

    য়ରਯਸ о૑ݶ Ӓ ੋ௏؊ח ࣽਤܳ ࠁ੹ೠ׮. • औѱ ݈೧ࢲ যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ ٜ݅ӝ য۰਑
  13. 3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: ੐੄੄ Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח

    ӝߨ • Hyperplane ӝળ ন੄ ҳрী ੓ח૑, ਺੄ ҳрী ੓ח૑۽ ӝࣿೣ
  14. 3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping • ੐੄੄

    ױੌ ೯۳۽ ಴അغח ୷ࣗ ࢎ࢚ (ੋ௏؊) fо ੓যࢲ (૊, f(x) = Ax) nѐ੄ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغ঻ਸ ٸ • જ਷ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ੄ ௼ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
  15. 3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping • ੐੄੄

    ױੌ ೯۳۽ ಴അغח ୷ࣗ ࢎ࢚ (ੋ௏؊) fо ੓যࢲ (૊, f(x) = Ax) nѐ੄ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغ঻ਸ ٸ • જ਷ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ੄ ௼ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ੄ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻী ৔ೱਸ ߉਺ ϵ2/2 − ϵ3/3 log(n)
  16. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi

    ⋅ IDFi tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R        *%' q̅  tਘ౟u  t٣ૉפu j 
  17. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח ੄

    ഋక۽ ӝࣿؼ ࣻ ੓਺ • BM-25੄ ҃਋ীח =BM25(q,d) ੄ ഋక۽ ӝࣿؼ ࣻ ੓਺ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R        *%' q̅  tਘ౟u  t٣ૉפu j 
  18. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤੄ী খࢲࢲ Normalized Margin Termਸ ੿੄ೣ (௿ࣻ۾

    ف ޙࢲр੄ ૕੄ ର੉о դ׮ח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ ಴അೞח Aо ੓׮ݶ (૊, ) • ੉ ࢎ࢚੄ য়ରਯ਷ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
  19. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤੄ী খࢲࢲ Normalized Margin Termਸ ੿੄ೣ (௿ࣻ۾

    ف ޙࢲр੄ ૕੄ ର੉о դ׮ח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ ಴അೞח Aо ੓׮ݶ (૊, ) • ੉ ࢎ࢚੄ য়ରਯ਷ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
  20. 3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =

    (y1, y2, …)ী ؀೧ࢲ cross-attentionਸ ഝਊೠ ղ੸਷ ׮਺җ э੉ ӝࣿؽ
  21. 3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =

    (y1, y2, …)ী ؀೧ࢲ cross-attentionਸ ഝਊೠ ղ੸਷ ׮਺җ э੉ ӝࣿؽ • औѱ ݈ೞݶ ׸ࠁغযঠ ೞח ରਗ੄ ௼ӝח ௪ܻ ష௾੄ ӡ੉੄ ઁғী ࠺۹
  22. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑
  23. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ
  24. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక੄ Sparse ݽ؛ীࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ
  25. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక੄ Sparse ݽ؛ীࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ೙ਃೠ ରਗ੄ ௼ӝח ௪ܻ ష௾੄ ӡ੉੄ ઁғী ࠺۹ೣ
  26. &YQFSJNFOU

  27. &YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ੹୓ ޙױਸ

    ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա੄ ࠗ࠙ਸ Query۽, աݠ૑ ࠗ࠙ਸ Document۽ ࢖਺ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ੄ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل ׮ Recallਸ ஏ੿ೣ • ؀ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
  28. &YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प೷ Ѿҗ • ੐߬٬੄ ௼ӝо ੘ਸࣻ۾ ࢿמ੉

    ڄয૑ݴ, Retrievalীࢲ ف٘۞૑ѱ աఋթ • Retrieval੄ ҃਋ח BM25৬ Multi-Vectorо ࢤпࠁ׮ ੜೞח ಞ
  29. &YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia੄

    ղਊਸ ޛযࠁח ૕׹हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ؀೧ࢲ प೷ೣ • ؀ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense੄ ࢶഋ೤) • Multi-Vecter BERT • Sparse Model (BM25)
  30. &YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प೷ Ѿҗ • ৈ੹൤ ੐߬٬ ௼ӝо ੘਷

    ݽ؛਷ ࢿמ੉ ڄয૗ • पઁ ࢎۈ ૕ޙ੄ ҃਋, BM25ח ੜ ׹ೞ૑ ޅೣ = ੌ߈ച מ۱੄ ࠗ੤ • ই਎۞ ICTী ࠺೧ ੐߬٬੄ ௼ӝࠁ׮ ߑߨۿ੄ ର੉о ഻ঁ ઺ਃೠ Ѫਸ ࠅ ࣻ ੓਺
  31. &YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬

    زੌೞغ, ׹߸੉ ੿ഛ൤ ੌ஖ೞח ҃਋ܳ ஏ੿ • ؀ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प೷ Ѿҗ: • Hybrid ݽ؛੉ ࢿמ੉ જਵݴ 200ѐ ష௾ਸ ࠌਸ ٸ જ਺
  32. 4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ੐߬٬௼ӝо ੘ਸࣻ۾ ࢿמ਷ ڄয૑ݴ, ੌ߈ചо

    ਃҳغ૑ ঋח ؘ੉ఠ (ICT) ীࢲ ف٘۞૗ • Sparse ݽ؛੄ ҃਋ ੌ߈ചо ਃҳغח ؘ੉ఠ (Open-Domain QA)ীࢲח ࢿמ੉ ڄয૗ • Hybrid ݽ؛਷ ੹୓੸ਵ۽ ֫਷ ࢿמਸ ࠁৈષ = নଃ੄ ੢੼ਸ ஂೡ ࣻ ੓ח ഋక੐ • Intuition • അ BM25 ߑߨࠁ׮ ੐߬٬ ࢲ஖о જਸ Ѫਵ۽ ৘࢚ؽ. • ਋ܻ ؘ੉ఠח ੌ߈ചܳ ݆੉ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ੄ ੉ٙ੉ ௼૑ ঋਸ ٠