Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sparse, Dense, and Attentional Representations ...

Sparse, Dense, and Attentional Representations for Text Retrieval

Scatter Lab Inc.

August 28, 2020
Tweet

More Decks by Scatter Lab Inc.

Other Decks in Research

Transcript

  1. ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •

    Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
  2. 6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • ઱য૓ ௪ܻী ؀೧ࢲ ҙ۲ ੓ח ޙױਸ যڌѱ

    Ҏۄյ ࣻ ੓ਸө? • TF-IDF ١੄ Sparse Modelਸ ഝਊೞৈ 1ର੸ੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క਑ -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ੿׹ਸ ୶୹ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
  3. #J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:

    ௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ׮ܲ ੋ௏؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅࢑ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
  4. &GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈੸ਵ۽ Denseೠ ݽ؛੉ Sparseೠ ݽ؛ࠁ׮ ࢿמ੉ ֫਷ؘ…

    • ӟ ޙ੢ী ؀೧ࢲח ߈٘द Ӓۧ૑ ঋਸ ࣻب ੓ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ੉ ੘ই ޙ੢੄ ੄޷ܳ ࣻਊೞח מ۱(Capacity)੉ ࠗ઒೧ࢲ? • ޙ੢ਸ ੌ߈ചೞח מ۱(Generality)о ࠗ઒೧ࢲ?
  5. &GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ੄ ೨ब • Sparse Model੄ ࢿמਸ

    ࢚ഥೞ۰ݶ Dense Model੄ ରਗ੄ ௼ӝо ழঠ ೠ׮ • ೙ਃೠ ରਗ੄ ௼ӝח ޙࢲ੄ ӡ੉৬ যൃ੄ ं੗ী ੄೧ Ѿ੿ػ׮ • ই਎۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection਷ ੘਷ ରਗীࢲب ࢚׼൤ ਋ࣻೠ ࢿמਸ ࠁ੉ݴ, • Attention Model੄ ҃਋ীח ೙ਃ ରਗ੄ ௼ӝח ੘ਵա ҅࢑۝੉ ݆੉ ೙ਃೞ׮
  6. • ੐੄੄ ௪ܻ৬ ޙࢲী ؀ೠ 1-hot ಴അ q,d৬ ੉ܳ ਤೠ

    Encoder ೣࣻ fо ੓׮Ҋ о੿ • ਬࢎب ӝ߈ਵ۽ ੼ࣻܳ ݫӟ׮Ҋ о੿: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R     j G R &ODPEFS -45. #&35 j tਘ౟u tੌۄ੉যझu t٣ૉפu uחu tu E     j G E &ODPEFS -45. #&35 j R E G R G E 
  7. 3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ੐੄੄ d1, d2ী ؀೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder

    ೣࣻ fח ׮਺ਸ ݅઒ೣ • ੐੄੄ d1, d2ী ؀೧ࢲ “ε-੿ഛೠ” Encoder ೣࣻח ׮਺ਸ ݅઒ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
  8. 3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ௏؊о যൃ ࢿ࠙ ೤ ௼ӝ੄ ߈࠺۹ೞח

    য়ରਯਸ о૑ݶ Ӓ ੋ௏؊ח ࣽਤܳ ࠁ੹ೠ׮. • औѱ ݈೧ࢲ যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ ٜ݅ӝ য۰਑
  9. 3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping • ੐੄੄

    ױੌ ೯۳۽ ಴അغח ୷ࣗ ࢎ࢚ (ੋ௏؊) fо ੓যࢲ (૊, f(x) = Ax) nѐ੄ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغ঻ਸ ٸ • જ਷ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ੄ ௼ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
  10. 3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping • ੐੄੄

    ױੌ ೯۳۽ ಴അغח ୷ࣗ ࢎ࢚ (ੋ௏؊) fо ੓যࢲ (૊, f(x) = Ax) nѐ੄ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغ঻ਸ ٸ • જ਷ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ੄ ௼ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ੄ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻী ৔ೱਸ ߉਺ ϵ2/2 − ϵ3/3 log(n)
  11. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi

    ⋅ IDFi tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R        *%' q̅  tਘ౟u  t٣ૉפu j 
  12. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח ੄

    ഋక۽ ӝࣿؼ ࣻ ੓਺ • BM-25੄ ҃਋ীח =BM25(q,d) ੄ ഋక۽ ӝࣿؼ ࣻ ੓਺ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘ౟u t٣ૉפu tחu t঱ઁu tకযu tլ૑u R        *%' q̅  tਘ౟u  t٣ૉפu j 
  13. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤੄ী খࢲࢲ Normalized Margin Termਸ ੿੄ೣ (௿ࣻ۾

    ف ޙࢲр੄ ૕੄ ର੉о դ׮ח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ ಴അೞח Aо ੓׮ݶ (૊, ) • ੉ ࢎ࢚੄ য়ରਯ਷ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
  14. 3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤੄ী খࢲࢲ Normalized Margin Termਸ ੿੄ೣ (௿ࣻ۾

    ف ޙࢲр੄ ૕੄ ର੉о դ׮ח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ ಴അೞח Aо ੓׮ݶ (૊, ) • ੉ ࢎ࢚੄ য়ରਯ਷ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
  15. 3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =

    (y1, y2, …)ী ؀೧ࢲ cross-attentionਸ ഝਊೠ ղ੸਷ ׮਺җ э੉ ӝࣿؽ
  16. 3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =

    (y1, y2, …)ী ؀೧ࢲ cross-attentionਸ ഝਊೠ ղ੸਷ ׮਺җ э੉ ӝࣿؽ • औѱ ݈ೞݶ ׸ࠁغযঠ ೞח ରਗ੄ ௼ӝח ௪ܻ ష௾੄ ӡ੉੄ ઁғী ࠺۹
  17. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ
  18. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక੄ Sparse ݽ؛ীࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ
  19. 3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ௿ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ਷ ੋ௏؊ܳ

    ٜ݅ӝ য۰਑ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ௼ӝо যו੿بח ழঠೞҊ, ੉ח ޙࢲ੄ ࣻ੄ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక੄ Sparse ݽ؛ীࢲ য়ରਯਸ ઴੉۰ݶ যו੿ب੄ ରਗ ௼ӝо ׸ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ೙ਃೠ ରਗ੄ ௼ӝח ௪ܻ ష௾੄ ӡ੉੄ ઁғী ࠺۹ೣ
  20. &YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ੹୓ ޙױਸ

    ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա੄ ࠗ࠙ਸ Query۽, աݠ૑ ࠗ࠙ਸ Document۽ ࢖਺ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ੄ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل ׮ Recallਸ ஏ੿ೣ • ؀ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
  21. &YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प೷ Ѿҗ • ੐߬٬੄ ௼ӝо ੘ਸࣻ۾ ࢿמ੉

    ڄয૑ݴ, Retrievalীࢲ ف٘۞૑ѱ աఋթ • Retrieval੄ ҃਋ח BM25৬ Multi-Vectorо ࢤпࠁ׮ ੜೞח ಞ
  22. &YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia੄

    ղਊਸ ޛযࠁח ૕׹हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ؀೧ࢲ प೷ೣ • ؀ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense੄ ࢶഋ೤) • Multi-Vecter BERT • Sparse Model (BM25)
  23. &YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प೷ Ѿҗ • ৈ੹൤ ੐߬٬ ௼ӝо ੘਷

    ݽ؛਷ ࢿמ੉ ڄয૗ • पઁ ࢎۈ ૕ޙ੄ ҃਋, BM25ח ੜ ׹ೞ૑ ޅೣ = ੌ߈ച מ۱੄ ࠗ੤ • ই਎۞ ICTী ࠺೧ ੐߬٬੄ ௼ӝࠁ׮ ߑߨۿ੄ ର੉о ഻ঁ ઺ਃೠ Ѫਸ ࠅ ࣻ ੓਺
  24. &YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬

    زੌೞغ, ׹߸੉ ੿ഛ൤ ੌ஖ೞח ҃਋ܳ ஏ੿ • ؀ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प೷ Ѿҗ: • Hybrid ݽ؛੉ ࢿמ੉ જਵݴ 200ѐ ష௾ਸ ࠌਸ ٸ જ਺
  25. 4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ੐߬٬௼ӝо ੘ਸࣻ۾ ࢿמ਷ ڄয૑ݴ, ੌ߈ചо

    ਃҳغ૑ ঋח ؘ੉ఠ (ICT) ীࢲ ف٘۞૗ • Sparse ݽ؛੄ ҃਋ ੌ߈ചо ਃҳغח ؘ੉ఠ (Open-Domain QA)ীࢲח ࢿמ੉ ڄয૗ • Hybrid ݽ؛਷ ੹୓੸ਵ۽ ֫਷ ࢿמਸ ࠁৈષ = নଃ੄ ੢੼ਸ ஂೡ ࣻ ੓ח ഋక੐ • Intuition • അ BM25 ߑߨࠁ׮ ੐߬٬ ࢲ஖о જਸ Ѫਵ۽ ৘࢚ؽ. • ਋ܻ ؘ੉ఠח ੌ߈ചܳ ݆੉ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ੄ ੉ٙ੉ ௼૑ ঋਸ ٠