Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Approximate Nearest Neighbor Negative Contrasti...
Search
Scatter Lab Inc.
August 07, 2020
Research
2.5k
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval
Scatter Lab Inc.
August 07, 2020
More Decks by Scatter Lab Inc.
See All by Scatter Lab Inc.
zeta introduction
scatterlab
0
1.9k
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
scatterlab
0
4.4k
Adversarial Filters of Dataset Biases
scatterlab
0
2.3k
Sparse, Dense, and Attentional Representations for Text Retrieval
scatterlab
0
2.3k
Weight Poisoning Attacks on Pre-trained Models
scatterlab
0
2.2k
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList
scatterlab
0
2.3k
Open-Retrieval Conversational Question Answering
scatterlab
0
2.3k
What Can Neural Networks Reason About?
scatterlab
0
2.3k
Exploring the Limits of Transfer Learning with Unified Text-to-Text Transformer
scatterlab
0
2.3k
Other Decks in Research
See All in Research
「車1割削減、渋滞半減、公共交通2倍」を 熊本から岡山へ@RACDA設立30周年記念都市交通フォーラム2026
trafficbrain
1
1.2k
YOLO26_ Key Architectural Enhancements and Performance Benchmarking for Real-Time Object Detection
satai
3
820
機械学習で作った ポケモン対戦bot で 遊ぼう!
fufufukakaka
0
310
Scalable dynamic origin-destination demand estimation enhanced by high-resolution satellite imagery data
satai
3
290
LLM の Attention 機構まとめ — 数式・計算量・メモリ
puwaer
8
2.2k
Sequences of Logits Reveal the Low Rank Structure of Language Models
sansantech
PRO
1
270
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing
satai
3
880
AY 2026 Guide to Academic Writing Using Generative AI - Workshop
ks91
PRO
0
120
Ghost in the 7‑Zip: The Shadow of Residential Proxies Creeping into Your Life
nttcom
0
1.2k
コーディングエージェントとABNを再考
hf149
2
730
明日から使える!研究効率化ツール入門
matsui_528
13
7.3k
羽田新ルート運用6年の検証
1manken
0
160
Featured
See All Featured
How To Stay Up To Date on Web Technology
chriscoyier
790
250k
Balancing Empowerment & Direction
lara
6
1.2k
Code Review Best Practice
trishagee
74
20k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
Optimizing for Happiness
mojombo
378
71k
Exploring anti-patterns in Rails
aemeredith
3
420
Building AI with AI
inesmontani
PRO
1
1.1k
Visualization
eitanlees
152
17k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.8k
Un-Boring Meetings
codingconduct
0
320
Digital Projects Gone Horribly Wrong (And the UX Pros Who Still Save the Day) - Dean Schuster
uxyall
1
1.8k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
3.5k
Transcript
MLࣁա S6E3 Approximate Nearest Neighbor Negative Contrastive Learning for
Dense Text Retrieval ӣળࢿ ML Research Scientist, Pingpong
ݾର ݾର 1. Introduction 1. ޙઁ 2. ӝઓ ӝߨ
ೠ҅ 2. Approach 1. Ӕ ߑߨ ࣗѐ 2. ࠺زӝ ण ܖ౯ 3. Experiment 1. प ࢸ҅ 2. प Ѿҗ 3. ҳഅ ࣁࠗࢎ೦
• ࠄ ֤ޙীࢲ Ҿӓਵ۽ ಽҊ ೞח ޙઁח Open-Domain Question Answering
(QA) పझ • Open-Domain QAח যڃ بݫੋী Ҵೠغয ঋ ޙਸ ؍ਸ ٸ, ࠁਬೞҊ ח (~1M+) ޙࢲٜ оؘ ನೣغয ח ਸ ח పझ۽ ೡ ࣻ णפ. • ܳ ٜݶ ਤఃೖ٣ইী ઓೞח ݽٚ ޙࢲܳ ଵઑೡ ࣻ ח о ೞী “ఋ֢झח ݻಌࣃ ࢤݺܳ લয?” ী ೠ ਸ ח Ѫ ੑפ. ޙઁ [1/2]
• ٩۞ ӝ߈ ݽ؛ਸ ਊ೧ࢲ ࠁ ഛೠ ਸ ਸ ࣻ
݅, ݽٚ ޙࢲ(+Nর)ী ೧ োਸ ࣻ೯ೞח Ѫ ݒ ࠺ബਯ Ҋ, पदр ࢲ࠺झо ࠛоמೞח ೠ҅ णפ. • ӝઓ োҳٜ ࣘب ೠ҅ਸ ӓࠂೞӝ ਤ೧ ѱ فо stage ۽ ܻ࠙ೞৈ ޙઁܳ ಽҊ ೞणפ • 1. Document Retrieval: য ী ೧ࢲ ҙ۲ ח ޙࢲٜਸ ח ױ҅ • 2. Reading Comprehension: য ী ೠ ҳੋ ਸ ҙ۲ ޙࢲܳ ଵઑೞৈ بೞח ݽ؛ • য়ט ࣗѐ೧ ܾ٘ ֤ޙ Document Retrieval ࢿמ ೱ࢚ী ҙೠ ߑߨਸ ઁউפ. ޙઁ [2/2]
• ӝઓ ࠗ࠙ োҳীࢲח Document Retrieval ী Lexical Feature ܳ
۽ ࢎਊೞणפ. • द) BM25, TF-IDF, Keyword Matching ١١ (Elastic Search ػ ӝמ) • ೞ݅ ۞ೠ ߑߨ ೣ୷ (Semantic)ܳ ೧ೞҊ ҙ۲ػ ߸ਸ ਸ ࣻח হणפ. • द) Q. ־о పठۄ ঠ? -> (పठۄ, ) ਵ۽ Ѩ࢝೧ب ف ఃਕ٘ܳ ನೣೞח ޙࢲܳ ਸ ࣻ হ.. ӝઓ ߑߨ ೠ҅ [1/3]
• ୭Ӕ োҳٜ(Lee et al., 2019; Guu et al., 2020;
Seo et al. 2019) ৬ ޙࢲܳ BERTܳ ਊ೧ Representation ਵ۽ അೞৈ ࠁ Semantic ೠ ࠁܳ ನೡ ࣻ ח ߑߨਸ ઁউೞ. • ۞ೠ ߑߨٜ BI-Encoder ҳઑ ݽ؛ਸ ࢎਊೞݴ, In-Batch Negative ۽ णਸ ࣻ೯פ. • ण ৮ܐػ റীח Document Encoderܳ ਊ೧ࢲ ܻ ޙࢲٜਸ encoding ೧ ֬ • Inference दীח ݅ BERT۽ Representation ਸ ҅ೞҊ FAISS ৬ э Approximate Nearest Neighbor Search ోਸ ਊ೧ ߄۽ Representation җ оө Top-Kѐ ޙࢲܳ ӝઓ ߑߨ ೠ҅ [2/3]
Bi-encoder ޙࢲ
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512)
णߑߨ: In-Batch Negative Q1 D1 Q2 D2 Q3 D3 Q4
D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) Q ⋅ DT -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) Q ⋅ DT -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) 0.5 0.6 0.4 0.7 0.2 0.1 0.2 0.1 0.2 0.1 0.3 0.1 0.2 0.1 0.1 0.1 Softmax Q ⋅ DT п Row ߹۽ Softmaxܳ ஂೣ -> (4,4)
णߑߨ: In-Batch Negative Q1 Q2 Q3 Q4 D1 D2 D3
D4 Q1 D1 Q2 D2 Q3 D3 Q4 D4 ण ؘఠࣇ Q: (4, 512) D: (4, 512) 0.99 0.99 0.01 0.99 0.99 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 Q ⋅ DT ण ݾ: п Row ীࢲ غח ޙࢲо ઁੌ ֫ чਸ ыب۾ -> (4,4)
• ח Dense Retrieval ݽ؛ਸ णೡ ٸ ࢎਊೞח In-Batch Negativeী
ޙઁо ਸ פ. • In-Batch Negative ण ߑߨ যוب ਬࢎೠ ޙࢲٜਸ ୶ܻחؘীח ਬബೞ݅, ҙ۲ ח ޙࢲܳ ഛೞѱ ఐ࢝ೞӝীח Ӕࠄੋ ೠ҅о ਸ Ѫۄח оࢸਸ ࣁפ. • ৵ջೞݶ ৮ ҙ۲ হח റࠁٜ ী, ҙ۲ ח ೞա ޙࢲܳ ࡳب۾ णೞח Ѫҗ ҙ۲ࢿ ח റࠁٜ ীࢲ ҙ۲ ח ೞա ޙࢲܳ ࡳب۾ णೞח Ѫ ܰӝ ٸޙੑפ. ӝઓ ߑߨ ೠ҅ [2/3]
• negative sample ٜ representation ਸ t-SNEਵ۽ दпചೞৈ ࠙ࢳਸ ࣻ೯ೞणפ.
• ӝઓী ۽ ࢎਊೞ؍ Random, BM25 ӝ߈ Negative ٜ पઁ Relevant Document ৬ ࠙ನ ରо ब೮ • ژೠ Random Negative ۽ णػ ݽ؛۽ Dense Retrieval ਸ ࣻ೯द, पઁ ҙ۲ ޙࢲٜਸ நೞ ޅ೮. ӝઓ ߑߨ ೠ҅ [2/3]
• negative sample ٜ representation ਸ t-SNEਵ۽ दпചೞৈ ࠙ࢳਸ ࣻ೯ೞणפ.
• ӝઓী ۽ ࢎਊೞ؍ Random, BM25 ӝ߈ Negative ٜ पઁ Relevant Document ৬ ࠙ನ ରо ब೮ • ژೠ Random Negative ۽ णػ ݽ؛۽ Dense Retrieval ਸ ࣻ೯द, पઁ ҙ۲ ޙࢲٜਸ நೞ ޅ೮. ӝઓ ߑߨ ೠ҅ [2/3] “ উীࢲ ޤо ҙ۲ ޙࢲջ!” ೠ Ѫب णਸ ࣻ೯೧ঠ ೠ!
• ࠄ ֤ޙীࢲח णद ࢎਊغח negative sampleਸ ࡳח ࢜۽ ߑߨਸ
ઁউפ • Approximate nearest neighbor Negative Contrastive Estimation(ANCE) • ण р ݽ؛ retrieval ػ Ѿҗܳ ਊ೧ࢲ য۰ negative sampleਸ ݅٘ח ߑߨੑפ. • ࠺زӝਵ۽ faiss index ܳ N step ݃ সؘೞҊ, negative sample ਸ ࣘਵ۽ јनפ Approach
Approach
• ಣо పझ TREC 2019 Deep Learning Track ܳ ࢎਊೞणפ.
• Ѩ࢝ ূ Bing ਵ۽ ٜযৡ ߔ݅ѐ ࢚ ী ೧ࢲ ҙ۲ػ ޙࢲо ۨ࠶݂ غয ח ؘఠࣇ • ؘఠࣇਸ ࢶఖೠ ਬ۽ Ҋ, ୭नҊ, о അपੋ ࢚ടਸ ੜ ߈೮ӝ ⮶ޙী ࢎਊ೮Ҋ ח ӝࣿೞणפ. • ಣо ݫܼ MRRҗ Recall@1k, NDCGܳ ࢎਊೞणפ. • ࠗ࠙ ࢿמ Retrieval ী ೠ ࢿמਸ ஏೞҊ, ୶оਵ۽ য 100ѐ candidate ղীࢲ DR ݽ؛ਸ ਊ೧ ҙ۲ػ ޙࢲٜਸ Rerank ೞח מ۱ب э Ѩૐೞणפ. (ীࢲ RerankۄҊ ա৬ ח ࠗ࠙) • DPRҗ زੌೞѱ, بݫੋ ઁೠ হח QAؘఠࣇੋ OpenQA task ؘఠࣇਵ۽ب ಣоܳ ࣻ೯ೞणפ. ಣо ߑध Top-Nউী पઁ۽ ܻо ఋѶ ೞח passage о ನೣغয ח ইצ ಣоೞח ݫܼਸ ࢎਊೞणפ Experiment
Experiment
• ӝઓ ߑߨ BM25۽ Document Retrieval ࣻ೯റ, BERT ۽ Reranking
ೞח Two-Stage ߑߨਸ ࢎਊೞणפ • Inference दী ୨ 1.42 ୡ Ѧ۷णפ. • ߈ݶী ࠄ ֤ޙ ANN ӝ߈ Dense Retrieval ਸ ࢎਊ೮ӝ ٸޙী ࠁ ࡅܲ ࣘب Inference о оמפ. -> Inference दী 11.6ms ߆ী Ѧܻ ঋ. Ӓۢীب Two-Stage ࠁ ֫ ࢿמਸ ࠁৈષ Experiment
• Dense Retrievalਸ In-Batch Negative ߑधਵ۽݅ ण ೞח Ѫ ೠ҅
࠙ݺ ઓೠ • റࠁٜ р ࢶࣽਤܳ Ѿೞח מ۱ ࠗೞ. • ण җীࢲ ഁтܻח റࠁ ޙࢲٜ աৢ Ѫਸ о೧ࢲ, о оӰب۾ णਸ ೧ঠ ೠ. • ܳ ਤ೧ࢲ ण җীࢲ ୶ۿҗ زੌೞѱ ANN indexing ਸ ࣻ೯ೞҊ, negative ٜਸ retrieval۽ ࡳ ח ߑߨਸ ઁউೠ. ӒܻҊ ܳ ࠺زӝਵ۽ ࣻ೯ೞৈࢲ োࣘੋ णਸ ೡ ࣻ ب۾ ೠ • प Ѿҗ ઁউೞח ण ߑध पઁ పझীࢲ ࠁ ࣻೠ ࢿਸ ࠁৈ. • Ѩ࢝ Retrieval పझ৬, Open-Domain QAীࢲ Document Retrieval ࢿמਸ ಣоೞ Conclusion
• https://codertimo.github.io/2020/07/20/ANN-negative-contrastive-learning/ ଵҊܐ