Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sparse, Dense, and Attentional Representations ...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Scatter Lab Inc.
August 28, 2020
Research
2.3k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Sparse, Dense, and Attentional Representations for Text Retrieval
Scatter Lab Inc.
August 28, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.3k
Other Decks in Research
See All in Research
YOLO26_ Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection
satai
3
820
LLM Compute Infrastructure Overview
karakurist
2
1.5k
ScoreMatchingRiesz for Automatic Debiased Machine Learning and Policy Path Estimation with an Application to Japanese Monetary Policy Evaluation
masakat0
0
290
[BlackHatAsia2026] Hidden Telemetry: Uncovering TraceLogging ETW Providers You're Not Using (Yet)
asuna_jp
1
540
長時間動画QAにおけるマルチエージェント推論 ・SVAgent: Storyline-Guided Long Video Understanding via Cross-Modal Multi-Agent Collaboration
murakawatakuya
1
140
Can We Teach Logical Reasoning to LLMs? – An Approach Using Synthetic Corpora (AAAI 2026 bridge keynote)
morishtr
1
260
COFFEE-Japan PROJECT Impact Report(海ノ向こうコーヒー)
ontheslope
0
2k
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
1
1.4k
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
270
NLP colloquium: AI Safety Survey
kanekomasahiro
0
750
言語モデルから言語について語る際に押さえておきたいこと
eumesy
PRO
5
2.3k
Sleuthcon Keynote - How Cybercriminals (ab)use AI
fr0gger
0
190
Featured
See All Featured
Conquering PDFs: document understanding beyond plain text
inesmontani
PRO
4
2.8k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
980
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
11k
The Mindset for Success: Future Career Progression
greggifford
PRO
0
370
Context Engineering - Making Every Token Count
addyosmani
9
980
4 Signs Your Business is Dying
shpigford
187
22k
Navigating Algorithm Shifts & AI Overviews - #SMXNext
aleyda
1
1.3k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.5k
How to Build an AI Search Optimization Roadmap - Criteria and Steps to Take #SEOIRL
aleyda
1
2.1k
We Are The Robots
honzajavorek
0
250
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
590
Optimising Largest Contentful Paint
csswizardry
37
3.7k
Transcript
4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU
ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •
Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
*OUSPEVDUJPO
6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • য ௪ܻী ೧ࢲ ҙ۲ ח ޙױਸ যڌѱ
Ҏۄյ ࣻ ਸө? • TF-IDF ١ Sparse Modelਸ ഝਊೞৈ 1ରੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ਸ ୶ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
#J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:
௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ܲ ੋ؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈ਵ۽ Denseೠ ݽ؛ Sparseೠ ݽ؛ࠁ ࢿמ ؘ֫…
• ӟ ޙী ೧ࢲח ߈٘द Ӓۧ ঋਸ ࣻب ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ ই ޙ ܳ ࣻਊೞח מ۱(Capacity) ࠗ೧ࢲ? • ޙਸ ੌ߈ചೞח מ۱(Generality)о ࠗ೧ࢲ?
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ ೨ब • Sparse Model ࢿמਸ
࢚ഥೞ۰ݶ Dense Model ରਗ ӝо ழঠ ೠ • ਃೠ ରਗ ӝח ޙࢲ ӡ৬ যൃ ंী ೧ Ѿػ • ই۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection ରਗীࢲب ࢚ ࣻೠ ࢿמਸ ࠁݴ, • Attention Model ҃ীח ਃ ରਗ ӝח ਵա ҅ ݆ ਃೞ
"OBMZ[JOH%VBM&ODPEFS3FUSJFWBM
• ௪ܻ৬ ޙࢲী ೠ 1-hot അ q,d৬ ܳ ਤೠ
Encoder ೣࣻ fо Ҋ о • ਬࢎب ӝ߈ਵ۽ ࣻܳ ݫӟҊ о: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘu t٣ૉפu tחu tઁu tకযu tլu R j G R &ODPEFS -45. #&35 j tਘu tੌۄযझu t٣ૉפu uחu tu E j G E &ODPEFS -45. #&35 j R E G R G E
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • d1, d2ী ೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder
ೣࣻ fח ਸ ݅ೣ • d1, d2ী ೧ࢲ “ε-ഛೠ” Encoder ೣࣻח ਸ ݅ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ.
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ. • औѱ ݈೧ࢲ যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ ٜ݅ӝ য۰
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח
ӝߨ • Hyperplane ӝળ ন ҳрী ח, ҳрী ח۽ ӝࣿೣ
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻী ೱਸ ߉ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi
⋅ IDFi tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח
ഋక۽ ӝࣿؼ ࣻ • BM-25 ҃ীח =BM25(q,d) ഋక۽ ӝࣿؼ ࣻ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ • औѱ ݈ೞݶ ࠁغযঠ ೞח ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ਃೠ ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹ೣ
&YQFSJNFOU
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ޙױਸ
ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա ࠗ࠙ਸ Query۽, աݠ ࠗ࠙ਸ Document۽ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل Recallਸ ஏೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प Ѿҗ • ߬٬ ӝо ਸࣻ۾ ࢿמ
ڄযݴ, Retrievalীࢲ ف٘۞ѱ աఋթ • Retrieval ҃ח BM25৬ Multi-Vectorо ࢤпࠁ ੜೞח ಞ
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia
ղਊਸ ޛযࠁח हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ೧ࢲ पೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense ࢶഋ) • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प Ѿҗ • ৈ ߬٬ ӝо
ݽ؛ ࢿמ ڄয • पઁ ࢎۈ ޙ ҃, BM25ח ੜ ೞ ޅೣ = ੌ߈ച מ۱ ࠗ • ই۞ ICTী ࠺೧ ߬٬ ӝࠁ ߑߨۿ ରо ഻ঁ ਃೠ Ѫਸ ࠅ ࣻ
&YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬
زੌೞغ, ߸ ഛ ੌೞח ҃ܳ ஏ • ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प Ѿҗ: • Hybrid ݽ؛ ࢿמ જਵݴ 200ѐ షਸ ࠌਸ ٸ જ
4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ߬٬ӝо ਸࣻ۾ ࢿמ ڄযݴ, ੌ߈ചо
ਃҳغ ঋח ؘఠ (ICT) ীࢲ ف٘۞ • Sparse ݽ؛ ҃ ੌ߈ചо ਃҳغח ؘఠ (Open-Domain QA)ীࢲח ࢿמ ڄয • Hybrid ݽ؛ ਵ۽ ֫ ࢿמਸ ࠁৈષ = নଃ ਸ ஂೡ ࣻ ח ഋక • Intuition • അ BM25 ߑߨࠁ ߬٬ ࢲо જਸ Ѫਵ۽ ࢚ؽ. • ܻ ؘఠח ੌ߈ചܳ ݆ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ ٙ ঋਸ ٠