Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sparse, Dense, and Attentional Representations ...
Search
Scatter Lab Inc.
August 28, 2020
Research
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
Scatter Lab Inc.
August 28, 2020
Tweet
Share
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.8k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.1k
Adversarial Filters of Dataset Biases
scatterlab
0
2.2k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.2k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.2k
Other Decks in Research
See All in Research
音声感情認識技術の進展と展望
nagase
0
200
A scalable, annual aboveground biomass product for monitoring carbon impacts of ecosystem restoration projects
satai
4
340
心理言語学の視点から再考する言語モデルの学習過程
chemical_tree
2
610
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
0
190
MIRU2025 チュートリアル講演「ロボット基盤モデルの最前線」
haraduka
15
8.6k
20250605_新交通システム推進議連_熊本都市圏「車1割削減、渋滞半減、公共交通2倍」から考える地方都市交通政策
trafficbrain
0
840
なめらかなシステムと運用維持の終わらぬ未来 / dicomo2025_coherently_fittable_system
monochromegane
0
3.6k
Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping
satai
3
190
生成的推薦の人気バイアスの分析:暗記の観点から / JSAI2025
upura
0
280
論文紹介:Not All Tokens Are What You Need for Pretraining
kosuken
0
190
2021年度-基盤研究B-研究計画調書
trycycle
PRO
0
340
カスタマーサクセスの視点からAWS Summitの展示を考える~製品開発で活用できる勘所~
masakiokuda
2
210
Featured
See All Featured
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
9
580
The Cost Of JavaScript in 2023
addyosmani
53
9k
Gamification - CAS2011
davidbonilla
81
5.5k
How to Ace a Technical Interview
jacobian
280
24k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Code Reviewing Like a Champion
maltzj
525
40k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.6k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Fireside Chat
paigeccino
40
3.7k
Context Engineering - Making Every Token Count
addyosmani
5
210
Transcript
4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU
ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •
Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
*OUSPEVDUJPO
6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • য ௪ܻী ೧ࢲ ҙ۲ ח ޙױਸ যڌѱ
Ҏۄյ ࣻ ਸө? • TF-IDF ١ Sparse Modelਸ ഝਊೞৈ 1ରੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ਸ ୶ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
#J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:
௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ܲ ੋ؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈ਵ۽ Denseೠ ݽ؛ Sparseೠ ݽ؛ࠁ ࢿמ ؘ֫…
• ӟ ޙী ೧ࢲח ߈٘द Ӓۧ ঋਸ ࣻب ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ ই ޙ ܳ ࣻਊೞח מ۱(Capacity) ࠗ೧ࢲ? • ޙਸ ੌ߈ചೞח מ۱(Generality)о ࠗ೧ࢲ?
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ ೨ब • Sparse Model ࢿמਸ
࢚ഥೞ۰ݶ Dense Model ରਗ ӝо ழঠ ೠ • ਃೠ ରਗ ӝח ޙࢲ ӡ৬ যൃ ंী ೧ Ѿػ • ই۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection ରਗীࢲب ࢚ ࣻೠ ࢿמਸ ࠁݴ, • Attention Model ҃ীח ਃ ରਗ ӝח ਵա ҅ ݆ ਃೞ
"OBMZ[JOH%VBM&ODPEFS3FUSJFWBM
• ௪ܻ৬ ޙࢲী ೠ 1-hot അ q,d৬ ܳ ਤೠ
Encoder ೣࣻ fо Ҋ о • ਬࢎب ӝ߈ਵ۽ ࣻܳ ݫӟҊ о: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘu t٣ૉפu tחu tઁu tకযu tլu R j G R &ODPEFS -45. #&35 j tਘu tੌۄযझu t٣ૉפu uחu tu E j G E &ODPEFS -45. #&35 j R E G R G E
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • d1, d2ী ೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder
ೣࣻ fח ਸ ݅ೣ • d1, d2ী ೧ࢲ “ε-ഛೠ” Encoder ೣࣻח ਸ ݅ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ.
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ. • औѱ ݈೧ࢲ যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ ٜ݅ӝ য۰
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח
ӝߨ • Hyperplane ӝળ ন ҳрী ח, ҳрী ח۽ ӝࣿೣ
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻী ೱਸ ߉ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi
⋅ IDFi tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח
ഋక۽ ӝࣿؼ ࣻ • BM-25 ҃ীח =BM25(q,d) ഋక۽ ӝࣿؼ ࣻ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ • औѱ ݈ೞݶ ࠁغযঠ ೞח ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ਃೠ ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹ೣ
&YQFSJNFOU
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ޙױਸ
ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա ࠗ࠙ਸ Query۽, աݠ ࠗ࠙ਸ Document۽ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل Recallਸ ஏೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प Ѿҗ • ߬٬ ӝо ਸࣻ۾ ࢿמ
ڄযݴ, Retrievalীࢲ ف٘۞ѱ աఋթ • Retrieval ҃ח BM25৬ Multi-Vectorо ࢤпࠁ ੜೞח ಞ
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia
ղਊਸ ޛযࠁח हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ೧ࢲ पೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense ࢶഋ) • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प Ѿҗ • ৈ ߬٬ ӝо
ݽ؛ ࢿמ ڄয • पઁ ࢎۈ ޙ ҃, BM25ח ੜ ೞ ޅೣ = ੌ߈ച מ۱ ࠗ • ই۞ ICTী ࠺೧ ߬٬ ӝࠁ ߑߨۿ ରо ഻ঁ ਃೠ Ѫਸ ࠅ ࣻ
&YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬
زੌೞغ, ߸ ഛ ੌೞח ҃ܳ ஏ • ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प Ѿҗ: • Hybrid ݽ؛ ࢿמ જਵݴ 200ѐ షਸ ࠌਸ ٸ જ
4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ߬٬ӝо ਸࣻ۾ ࢿמ ڄযݴ, ੌ߈ചо
ਃҳغ ঋח ؘఠ (ICT) ীࢲ ف٘۞ • Sparse ݽ؛ ҃ ੌ߈ചо ਃҳغח ؘఠ (Open-Domain QA)ীࢲח ࢿמ ڄয • Hybrid ݽ؛ ਵ۽ ֫ ࢿמਸ ࠁৈષ = নଃ ਸ ஂೡ ࣻ ח ഋక • Intuition • അ BM25 ߑߨࠁ ߬٬ ࢲо જਸ Ѫਵ۽ ࢚ؽ. • ܻ ؘఠח ੌ߈ചܳ ݆ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ ٙ ঋਸ ٠