Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Poly-encoders: Transformer Architectures and Pr...

Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

Unofficial Slides for Introducing Paper.

izuna385

May 04, 2020
Tweet

More Decks by izuna385

Other Decks in Technology

Transcript

  1. 2/18 Summary 2 • Combining both the merits of Bi-encoder

    and Cross-encoder Bi-encoder Faster Not considering cross-attention Cross-encoder Better performance with cross-attention slow • (Additional:) Pretraining strategy with datasets similar to the downstream tasks.
  2. 3/18 3 RQ and Solution • Research Question How to

    combining both the merits of Bi-encoder and cross-encoder ? • Solution Caching candidates for the attention to contexts.
  3. 7/18 [CLS] [CLS] [CLS] Caching for fast search. can’t consider

    cross-attention. Entity Encoder Structure (A.) Bi-encoder
  4. 8/18 Encoder Structure (B.) Cross-Encoder • Example: Zero-shot Entity Linking

    [Logeswaran et al., ACL’19] [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location ENT candidate entity descriptions
  5. 9/18 Encoder Structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] candidate entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring
  6. 10/18 Encoder Structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Considering mention-entity cross attention. candidate entity descriptions
  7. 11/18 Encoder Structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Slow inference per each mention and its candidates. Considering mention-entity cross attention. candidate entity descriptions
  8. 12/18 Poly-Encoder • Both and can be cached. à Fast

    inference. • Attention from candidates. à Extract pertinent parts of the context per candidate.
  9. 13/18 • In-batch negative sampling [Henderson et al., ‘17; Gillick

    et al., ‘18] Context Each gold label Dot one batch … … … In-Batch Training
  10. 14/18 • In-batch negative sampling [Henderson et al., ‘17; Gillick

    et al., ‘18] one batch gold negative negative … In-Batch Training Context Each gold label Dot …
  11. 16/18 Results (b.) Comparison with Bi- / Cross- / Poly-

    • See Table 4 of the original paper. ・ They also checked the effect of changing pretraining data for BERT.