Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sparse, Dense, and Attentional Representations ...
Search
Scatter Lab Inc.
August 28, 2020
Research
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
Scatter Lab Inc.
August 28, 2020
Tweet
Share
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.8k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.3k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.2k
Other Decks in Research
See All in Research
Proposal of an Information Delivery Method for Electronic Paper Signage Using Human Mobility as the Communication Medium / ICCE-Asia 2025
yumulab
0
260
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
260
ローテーション別のサイドアウト戦略 ~なぜあのローテは回らないのか?~
vball_panda
0
300
Thirty Years of Progress in Speech Synthesis: A Personal Perspective on the Past, Present, and Future
ktokuda
0
190
From Data Meshes to Data Spaces
posedio
PRO
0
480
AI Agentの精度改善に見るML開発との共通点 / commonalities in accuracy improvements in agentic era
shimacos
6
1.4k
ペットのかわいい瞬間を撮影する オートシャッターAIアプリへの スマートラベリングの適用
mssmkmr
0
390
Multi-Agent Large Language Models for Code Intelligence: Opportunities, Challenges, and Research Directions
fatemeh_fard
0
140
社内データ分析AIエージェントを できるだけ使いやすくする工夫
fufufukakaka
1
990
Satellites Reveal Mobility: A Commuting Origin-destination Flow Generator for Global Cities
satai
3
670
2026-01-30-MandSL-textbook-jp-cos-lod
yegusa
1
750
AIを叩き台として、 「検証」から「共創」へと進化するリサーチ
mela_dayo
0
190
Featured
See All Featured
Design in an AI World
tapps
0
180
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.3k
Mobile First: as difficult as doing things right
swwweet
225
10k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
26
3.4k
How to train your dragon (web standard)
notwaldorf
97
6.6k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Raft: Consensus for Rubyists
vanstee
141
7.4k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Agile Leadership in an Agile Organization
kimpetersen
PRO
0
120
Mozcon NYC 2025: Stop Losing SEO Traffic
samtorres
0
180
How to build a perfect <img>
jonoalderson
1
5.3k
Balancing Empowerment & Direction
lara
5
960
Transcript
4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU
ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •
Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
*OUSPEVDUJPO
6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • য ௪ܻী ೧ࢲ ҙ۲ ח ޙױਸ যڌѱ
Ҏۄյ ࣻ ਸө? • TF-IDF ١ Sparse Modelਸ ഝਊೞৈ 1ରੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ਸ ୶ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
#J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:
௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ܲ ੋ؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈ਵ۽ Denseೠ ݽ؛ Sparseೠ ݽ؛ࠁ ࢿמ ؘ֫…
• ӟ ޙী ೧ࢲח ߈٘द Ӓۧ ঋਸ ࣻب ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ ই ޙ ܳ ࣻਊೞח מ۱(Capacity) ࠗ೧ࢲ? • ޙਸ ੌ߈ചೞח מ۱(Generality)о ࠗ೧ࢲ?
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ ೨ब • Sparse Model ࢿמਸ
࢚ഥೞ۰ݶ Dense Model ରਗ ӝо ழঠ ೠ • ਃೠ ରਗ ӝח ޙࢲ ӡ৬ যൃ ंী ೧ Ѿػ • ই۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection ରਗীࢲب ࢚ ࣻೠ ࢿמਸ ࠁݴ, • Attention Model ҃ীח ਃ ରਗ ӝח ਵա ҅ ݆ ਃೞ
"OBMZ[JOH%VBM&ODPEFS3FUSJFWBM
• ௪ܻ৬ ޙࢲী ೠ 1-hot അ q,d৬ ܳ ਤೠ
Encoder ೣࣻ fо Ҋ о • ਬࢎب ӝ߈ਵ۽ ࣻܳ ݫӟҊ о: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘu t٣ૉפu tחu tઁu tకযu tլu R j G R &ODPEFS -45. #&35 j tਘu tੌۄযझu t٣ૉפu uחu tu E j G E &ODPEFS -45. #&35 j R E G R G E
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • d1, d2ী ೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder
ೣࣻ fח ਸ ݅ೣ • d1, d2ী ೧ࢲ “ε-ഛೠ” Encoder ೣࣻח ਸ ݅ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ.
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ. • औѱ ݈೧ࢲ যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ ٜ݅ӝ য۰
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח
ӝߨ • Hyperplane ӝળ ন ҳрী ח, ҳрী ח۽ ӝࣿೣ
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻী ೱਸ ߉ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi
⋅ IDFi tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח
ഋక۽ ӝࣿؼ ࣻ • BM-25 ҃ীח =BM25(q,d) ഋక۽ ӝࣿؼ ࣻ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ • औѱ ݈ೞݶ ࠁغযঠ ೞח ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ਃೠ ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹ೣ
&YQFSJNFOU
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ޙױਸ
ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա ࠗ࠙ਸ Query۽, աݠ ࠗ࠙ਸ Document۽ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل Recallਸ ஏೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प Ѿҗ • ߬٬ ӝо ਸࣻ۾ ࢿמ
ڄযݴ, Retrievalীࢲ ف٘۞ѱ աఋթ • Retrieval ҃ח BM25৬ Multi-Vectorо ࢤпࠁ ੜೞח ಞ
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia
ղਊਸ ޛযࠁח हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ೧ࢲ पೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense ࢶഋ) • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प Ѿҗ • ৈ ߬٬ ӝо
ݽ؛ ࢿמ ڄয • पઁ ࢎۈ ޙ ҃, BM25ח ੜ ೞ ޅೣ = ੌ߈ച מ۱ ࠗ • ই۞ ICTী ࠺೧ ߬٬ ӝࠁ ߑߨۿ ରо ഻ঁ ਃೠ Ѫਸ ࠅ ࣻ
&YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬
زੌೞغ, ߸ ഛ ੌೞח ҃ܳ ஏ • ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प Ѿҗ: • Hybrid ݽ؛ ࢿמ જਵݴ 200ѐ షਸ ࠌਸ ٸ જ
4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ߬٬ӝо ਸࣻ۾ ࢿמ ڄযݴ, ੌ߈ചо
ਃҳغ ঋח ؘఠ (ICT) ীࢲ ف٘۞ • Sparse ݽ؛ ҃ ੌ߈ചо ਃҳغח ؘఠ (Open-Domain QA)ীࢲח ࢿמ ڄয • Hybrid ݽ؛ ਵ۽ ֫ ࢿמਸ ࠁৈષ = নଃ ਸ ஂೡ ࣻ ח ഋక • Intuition • അ BM25 ߑߨࠁ ߬٬ ࢲо જਸ Ѫਵ۽ ࢚ؽ. • ܻ ؘఠח ੌ߈ചܳ ݆ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ ٙ ঋਸ ٠