and Cross-encoder Bi-encoder Faster Not considering cross-attention Cross-encoder Better performance with cross-attention slow • (Additional:) Pretraining strategy with datasets similar to the downstream tasks.
[Logeswaran et al., ACL’19] [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location ENT candidate entity descriptions
entity per mention, [CLS] mention context [ENT] candidate entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring
entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Slow inference per each mention and its candidates. Considering mention-entity cross attention. candidate entity descriptions