Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Sparse, Dense, and Attentional Representations ...
Search
Scatter Lab Inc.
August 28, 2020
Research
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
Scatter Lab Inc.
August 28, 2020
Tweet
Share
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.8k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.3k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
scatterlab
0
2.5k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.2k
Other Decks in Research
See All in Research
ペットのかわいい瞬間を撮影する オートシャッターAIアプリへの スマートラベリングの適用
mssmkmr
0
260
AIスーパーコンピュータにおけるLLM学習処理性能の計測と可観測性 / AI Supercomputer LLM Benchmarking and Observability
yuukit
1
650
SREはサイバネティクスの夢をみるか? / Do SREs Dream of Cybernetics?
yuukit
3
380
第二言語習得研究における 明示的・暗示的知識の再検討:この分類は何に役に立つか,何に役に立たないか
tam07pb915
0
1.2k
空間音響処理における物理法則に基づく機械学習
skoyamalab
0
190
Upgrading Multi-Agent Pathfinding for the Real World
kei18
0
210
Aurora Serverless からAurora Serverless v2への課題と知見を論文から読み解く/Understanding the challenges and insights of moving from Aurora Serverless to Aurora Serverless v2 from a paper
bootjp
6
1.5k
一般道の交通量減少と速度低下についての全国分析と熊本市におけるケーススタディ(20251122 土木計画学研究発表会)
trafficbrain
0
160
2026年1月の生成AI領域の重要リリース&トピック解説
kajikent
0
340
病院向け生成AIプロダクト開発の実践と課題
hagino3000
0
530
HoliTracer:Holistic Vectorization of Geographic Objects from Large-Size Remote Sensing Imagery
satai
3
620
Earth AI: Unlocking Geospatial Insights with Foundation Models and Cross-Modal Reasoning
satai
3
480
Featured
See All Featured
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
0
260
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.8k
Writing Fast Ruby
sferik
630
62k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Navigating Team Friction
lara
192
16k
Exploring anti-patterns in Rails
aemeredith
2
250
For a Future-Friendly Web
brad_frost
182
10k
How to Talk to Developers About Accessibility
jct
2
130
SEO for Brand Visibility & Recognition
aleyda
0
4.2k
Lightning Talk: Beautiful Slides for Beginners
inesmontani
PRO
1
440
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
180
Producing Creativity
orderedlist
PRO
348
40k
Transcript
4QBSTF %FOTFBOE"UUFOUJPOBM 3FQSFTFOUBUJPOGPS5FYU3FUSJFWBM ҳ࢚ળ .-4DJFOUJTU
ݾର ݾର • Introduction • Analyzing Dual Encoder Retrieval •
Rank Preservation over Dense Model (Projection) • Rank Preservation over Sparse Model • Rank Preservation over Attention Model • Experiment & Analysis
*OUSPEVDUJPO
6TJOH&ODPEFSTPWFS3FUSJFWBM5BTL *OUSPEVDUJPO • য ௪ܻী ೧ࢲ ҙ۲ ח ޙױਸ যڌѱ
Ҏۄյ ࣻ ਸө? • TF-IDF ١ Sparse Modelਸ ഝਊೞৈ 1ରੋ റࠁ ޙࢲٜਸ Ҏۄն • ௪ܻ৬ п റࠁ ޙࢲٜਸ Dense Encoderܳ క -> ࢎ࢚ೠ ߭ఠܳ ഝਊೞৈ ਸ ୶ Contextualized Sparse Representation with Rectified N-Gram Attention for Open-Domain Question Answering, Lee et al., ICLR 2019
#J&ODPEFSGPS3FUSJFWBM *OUSPEVDUJPO • Cross-Encoder vs Bi-Encoder (Dual Encoder) • Cross-Encoder:
௪ܻ৬ റࠁܳ ೠ ੑ۱ਵ۽ ޘযࢲ ֍যࢲ ࠙ܨೞח ߑध • Bi-Encoder: ௪ܻ৬ റࠁܳ пп ܲ ੋ؊۽ ࢎ࢚ೠ റ ਬࢎبܳ ҅ೞח ߑध Poly-encoders: architectures and pre-training strategies for fast and accurate multi-sentence scoring Humeau et al., ICLR 2020
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ੌ߈ਵ۽ Denseೠ ݽ؛ Sparseೠ ݽ؛ࠁ ࢿמ ؘ֫…
• ӟ ޙী ೧ࢲח ߈٘द Ӓۧ ঋਸ ࣻب ח അ࢚ਸ ߊѼ • ৵ Ӓۡө? • ରਗ ই ޙ ܳ ࣻਊೞח מ۱(Capacity) ࠗ೧ࢲ? • ޙਸ ੌ߈ചೞח מ۱(Generality)о ࠗ೧ࢲ?
&GGFDUJWFOFTTPG%FOTJUZ *OUSPEVDUJPO • ࠄ ֤ޙ ೨ब • Sparse Model ࢿמਸ
࢚ഥೞ۰ݶ Dense Model ରਗ ӝо ழঠ ೠ • ਃೠ ରਗ ӝח ޙࢲ ӡ৬ যൃ ंী ೧ Ѿػ • ই۞ Sparse Model, Random Projection, Attention Model (Cross Enc)ਸ ࠺Үೡ ٸ • Random Projection ରਗীࢲب ࢚ ࣻೠ ࢿמਸ ࠁݴ, • Attention Model ҃ীח ਃ ରਗ ӝח ਵա ҅ ݆ ਃೞ
"OBMZ[JOH%VBM&ODPEFS3FUSJFWBM
• ௪ܻ৬ ޙࢲী ೠ 1-hot അ q,d৬ ܳ ਤೠ
Encoder ೣࣻ fо Ҋ о • ਬࢎب ӝ߈ਵ۽ ࣻܳ ݫӟҊ о: <q, d>, <f(q), f(d)> .BUIFNBUJDBM3FQGPS&ODPEFST "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM tਘu t٣ૉפu tחu tઁu tకযu tլu R j G R &ODPEFS -45. #&35 j tਘu tੌۄযझu t٣ૉפu uחu tu E j G E &ODPEFS -45. #&35 j R E G R G E
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • d1, d2ী ೧ࢲ “ࣽਤܳ ࠁઓೞח” Encoder
ೣࣻ fח ਸ ݅ೣ • d1, d2ী ೧ࢲ “ε-ഛೠ” Encoder ೣࣻח ਸ ݅ೣ ⟨q, d1 ⟩ > ⟨q, d2 ⟩ ⇒ ⟨f(q), f(d1 )⟩ > ⟨f(q), f(d2 )⟩ |∥f(q) − f(d)∥2 − ∥q − d∥2 | ≤ ϵ ⋅ ∥q − d∥2
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ.
3BOL1SFTFSWBUJPOPWFS%FOTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যڃ ੋ؊о যൃ ࢿ࠙ ӝ ߈࠺۹ೞח
য়ରਯਸ оݶ Ӓ ੋ؊ח ࣽਤܳ ࠁೠ. • औѱ ݈೧ࢲ যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ ٜ݅ӝ য۰
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Projection Method: Hyperplaneਵ۽ ߭ఠܳ ࢎ࢚೧ࢲ ୷ࣗदఃח
ӝߨ • Hyperplane ӝળ ন ҳрী ח, ҳрী ח۽ ӝࣿೣ
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS1SPKFDUJPO "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • Encoder f with contraction mapping •
ױੌ ೯۳۽ അغח ୷ࣗ ࢎ࢚ (ੋ؊) fо যࢲ (, f(x) = Ax) nѐ ޙࢲܳ ࢎ࢚ೡٸ • Rademacher Embedding, Gaussian Embedding: п ࢿ࠙ਸ ےؒೞѱ ࢶఖغਸ ٸ • જ Aܳ ٜ݅ӝ ਤೠ ୷ࣗ ߭ఠ ӝ kח ( ) ী ߈࠺۹ೞҊ ী ࠺۹ೣ • औѱ ݈೧ࢲ, ୷ࣗػ ରਗ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻী ೱਸ ߉ ϵ2/2 − ϵ3/3 log(n)
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: ¯ qi = qi
⋅ IDFi tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • TF-IDF ݽ؛ਸ ࢤп೧ࠁӝ: • TF-IDF ਬࢎبח
ഋక۽ ӝࣿؼ ࣻ • BM-25 ҃ীח =BM25(q,d) ഋక۽ ӝࣿؼ ࣻ ¯ qi = qi ⋅ IDFi ⟨¯ q, d⟩ ⟨¯ q, ¯ d⟩ tਘu t٣ૉפu tחu tઁu tకযu tլu R *%' q̅ tਘu t٣ૉפu j
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS4QBSTF.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • ֤ী খࢲࢲ Normalized Margin Termਸ ೣ (ࣻ۾
ف ޙࢲр ରо դח ڷ) • TF-IDFܳ ୷ࣗ ࢎ࢚ ഋక۽ അೞח Aо ݶ (, ) • ࢎ࢚ য়ରਯ ী ࠺۹ೣ • औѱ ݈೧ࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ ¯ q = Aq, ¯ d = Ad 4exp(−k(δ2 − δ3)/4) δ(q, d1 , d2 ) = q ⋅ (d1 − d2 ) ∥q∥ ⋅ ∥d1 − d2 ∥
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ
3BOL1SFTFSWBUJPOPWFS"UUFOUJPO.PEFM "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • x = (x1, x2, …), y =
(y1, y2, …)ী ೧ࢲ cross-attentionਸ ഝਊೠ ղ җ э ӝࣿؽ • औѱ ݈ೞݶ ࠁغযঠ ೞח ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ
3BOL1SFTFSWBUJPO 4VNNBSZ "OBMZ[JOH%VBM&ODPEFS3FUSJFWBM • যൃо ࣻ۾, ޙࢲ-௪ܻо ݆ਸࣻ۾ જ ੋ؊ܳ
ٜ݅ӝ য۰ • Ax ഋక۽ ରਗਸ ୷ࣗೡ ٸ, Ӓ ӝо যוبח ழঠೞҊ, ח ޙࢲ ࣻ ۽Ӓী ࠺۹ೣ • TF-IDF ഋక Sparse ݽ؛ীࢲ য়ରਯਸ ۰ݶ যוب ରਗ ӝо ࠁغযঠ ೣ • Cross-Attentionਸ ഝਊೡ ٸ ਃೠ ରਗ ӝח ௪ܻ ష ӡ ઁғী ࠺۹ೣ
&YQFSJNFOU
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • ߑߨ: Inverse Cloze Test • ޙױਸ
ৈ۞ ࠗ࠙ਵ۽ ա׃ • ೞա ࠗ࠙ਸ Query۽, աݠ ࠗ࠙ਸ Document۽ • ࠄ ֤ޙীࢲח Wikipediaܳ ഝਊೞৈ 1M ݅ఀ ௪ܻܳ ࢤࢿೣ • Rankingҗ Retrieval ل Recallਸ ஏೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU*$5PWFS8JLJQFEJB &YQFSJNFOU • प Ѿҗ • ߬٬ ӝо ਸࣻ۾ ࢿמ
ڄযݴ, Retrievalীࢲ ف٘۞ѱ աఋթ • Retrieval ҃ח BM25৬ Multi-Vectorо ࢤпࠁ ੜೞח ಞ
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • ߑߨ: Natural Questions Dataset • पઁ Wikipedia
ղਊਸ ޛযࠁח हਵ۽ ҳࢿ • 87,925ѐ۽ ള۲दఃҊ 3,610ѐী ೧ࢲ पೣ • ઑҵ: • Cross-Attention, Sum-of-max • Dual-Encoder BERT, Hybrid Dual-Encoder Bert (Sparse৬ Dense ࢶഋ) • Multi-Vecter BERT • Sparse Model (BM25)
&YQFSJNFOU0QFO%PNBJO2" &YQFSJNFOU • प Ѿҗ • ৈ ߬٬ ӝо
ݽ؛ ࢿמ ڄয • पઁ ࢎۈ ޙ ҃, BM25ח ੜ ೞ ޅೣ = ੌ߈ച מ۱ ࠗ • ই۞ ICTী ࠺೧ ߬٬ ӝࠁ ߑߨۿ ରо ഻ঁ ਃೠ Ѫਸ ࠅ ࣻ
&YQFSJNFOU4IPSU"OTXFS&YBDU.BUDI &YQFSJNFOU • ߑߨ: Natural Questions Dataset • Experiment 2৬
زੌೞغ, ߸ ഛ ੌೞח ҃ܳ ஏ • ઑҵ: • DE-BERT, Hybrid-BERT, Multi-BERT (Best Dense) • Sparse Model (BM25) • प Ѿҗ: • Hybrid ݽ؛ ࢿמ જਵݴ 200ѐ షਸ ࠌਸ ٸ જ
4VNNBSZ*OUVJUJPO &YQFSJNFOU • Summary • ߬٬ӝо ਸࣻ۾ ࢿמ ڄযݴ, ੌ߈ചо
ਃҳغ ঋח ؘఠ (ICT) ীࢲ ف٘۞ • Sparse ݽ؛ ҃ ੌ߈ചо ਃҳغח ؘఠ (Open-Domain QA)ীࢲח ࢿמ ڄয • Hybrid ݽ؛ ਵ۽ ֫ ࢿמਸ ࠁৈષ = নଃ ਸ ஂೡ ࣻ ח ഋక • Intuition • അ BM25 ߑߨࠁ ߬٬ ࢲо જਸ Ѫਵ۽ ࢚ؽ. • ܻ ؘఠח ੌ߈ചܳ ݆ ਃҳೞӝ ٸޙী Hybridܳ ॳח Ѫ ٙ ঋਸ ٠