Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sparse, Dense, and Attentional Representations ...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Scatter Lab Inc.
August 28, 2020
Research
2.3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Sparse, Dense, and Attentional Representations for Text Retrieval
Scatter Lab Inc.
August 28, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.3k
Other Decks in Research
See All in Research
適応的スパムフィルタのための軽量な類似メッセージカウンタ / jsai2026-adaptive-spam-filter
monochromegane
0
3.8k
Claude Code × autoresearch 実践
mathbullet
0
170
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
880
討議:RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
0
980
正規分布と最適化について
koide3
1
270
さくらインターネット研究所テックトーク2026春、研究開発Gr.25年度成果26年度方針
kikuzo
0
150
Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing
satai
3
850
Ankylosing Spondylitis
ankh2054
0
180
Fukui Shibiten 39 - AI Art
butchi
0
130
東京大学工学部計数工学科、計数工学特別講義の説明資料
kikuzo
0
510
Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)
morishtr
1
260
老舗ものづくり企業でリサーチが変革を起こすまで - 三菱重工DXの実践
skydats
0
190
Featured
See All Featured
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Building Adaptive Systems
keathley
44
3.1k
Typedesign – Prime Four
hannesfritz
42
3.1k
The SEO identity crisis: Don't let AI make you average
varn
0
500
Unsuck your backbone
ammeep
672
58k
Marketing Yourself as an Engineer | Alaka | Gurzu
gurzu
0
240
Visual Storytelling: How to be a Superhuman Communicator
reverentgeek
2
560
Become a Pro
speakerdeck
PRO
31
6k
Stewardship and Sustainability of Urban and Community Forests
pwiseman
0
230
Why Your Marketing Sucks and What You Can Do About It - Sophie Logan
marketingsoph
0
170
Producing Creativity
orderedlist
PRO
348
40k
Documentation Writing (for coders)
carmenintech
77
5.4k
Transcript
4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU
ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •
Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
*OUSPEVDUJPO
6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • য ௪ܻী ೧ࢲ ҙ۲ ח ޙױਸ যڌѱ
Ҏۄյ ࣻ ਸө? • TF-IDF ١ Sparse Modelਸ ഝਊೞৈ 1ରੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ਸ ୶ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
#J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:
௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ܲ ੋ؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈ਵ۽ Denseೠ ݽ؛ Sparseೠ ݽ؛ࠁ ࢿמ ؘ֫…
• ӟ ޙী ೧ࢲח ߈٘द Ӓۧ ঋਸ ࣻب ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ ই ޙ ܳ ࣻਊೞח מ۱(Capacity) ࠗ೧ࢲ? • ޙਸ ੌ߈ചೞח מ۱(Generality)о ࠗ೧ࢲ?
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ ೨ब • Sparse Model ࢿמਸ
࢚ഥೞ۰ݶ Dense Model ରਗ ӝо ழঠ ೠ • ਃೠ ରਗ ӝח ޙࢲ ӡ৬ যൃ ंী ೧ Ѿػ • ই۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection ରਗীࢲب ࢚ ࣻೠ ࢿמਸ ࠁݴ, • Attention Model ҃ীח ਃ ରਗ ӝח ਵա ҅ ݆ ਃೞ
"OBMZ[JOH%VBM&ODPEFS3FUSJFWBM
• ௪ܻ৬ ޙࢲী ೠ 1-hot അ q,d৬ ܳ ਤೠ
Encoder ೣࣻ fо Ҋ о • ਬࢎب ӝ߈ਵ۽ ࣻܳ ݫӟҊ о: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘu t٣ૉפu tחu tઁu tకযu tլu R j G R &ODPEFS -45. #&35 j tਘu tੌۄযझu t٣ૉפu uחu tu E j G E &ODPEFS -45. #&35 j R E G R G E
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • d1, d2ী ೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder
ೣࣻ fח ਸ ݅ೣ • d1, d2ী ೧ࢲ “ε-ഛೠ” Encoder ೣࣻח ਸ ݅ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ.
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ. • औѱ ݈೧ࢲ যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ ٜ݅ӝ য۰
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח
ӝߨ • Hyperplane ӝળ ন ҳрী ח, ҳрী ח۽ ӝࣿೣ
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻী ೱਸ ߉ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi
⋅ IDFi tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח
ഋక۽ ӝࣿؼ ࣻ • BM-25 ҃ীח =BM25(q,d) ഋక۽ ӝࣿؼ ࣻ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ • औѱ ݈ೞݶ ࠁغযঠ ೞח ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ਃೠ ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹ೣ
&YQFSJNFOU
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ޙױਸ
ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա ࠗ࠙ਸ Query۽, աݠ ࠗ࠙ਸ Document۽ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل Recallਸ ஏೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प Ѿҗ • ߬٬ ӝо ਸࣻ۾ ࢿמ
ڄযݴ, Retrievalীࢲ ف٘۞ѱ աఋթ • Retrieval ҃ח BM25৬ Multi-Vectorо ࢤпࠁ ੜೞח ಞ
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia
ղਊਸ ޛযࠁח हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ೧ࢲ पೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense ࢶഋ) • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प Ѿҗ • ৈ ߬٬ ӝо
ݽ؛ ࢿמ ڄয • पઁ ࢎۈ ޙ ҃, BM25ח ੜ ೞ ޅೣ = ੌ߈ച מ۱ ࠗ • ই۞ ICTী ࠺೧ ߬٬ ӝࠁ ߑߨۿ ରо ഻ঁ ਃೠ Ѫਸ ࠅ ࣻ
&YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬
زੌೞغ, ߸ ഛ ੌೞח ҃ܳ ஏ • ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प Ѿҗ: • Hybrid ݽ؛ ࢿמ જਵݴ 200ѐ షਸ ࠌਸ ٸ જ
4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ߬٬ӝо ਸࣻ۾ ࢿמ ڄযݴ, ੌ߈ചо
ਃҳغ ঋח ؘఠ (ICT) ীࢲ ف٘۞ • Sparse ݽ؛ ҃ ੌ߈ചо ਃҳغח ؘఠ (Open-Domain QA)ীࢲח ࢿמ ڄয • Hybrid ݽ؛ ਵ۽ ֫ ࢿמਸ ࠁৈષ = নଃ ਸ ஂೡ ࣻ ח ഋక • Intuition • അ BM25 ߑߨࠁ ߬٬ ࢲо જਸ Ѫਵ۽ ࢚ؽ. • ܻ ؘఠח ੌ߈ചܳ ݆ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ ٙ ঋਸ ٠