Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Zero-Shot Entity Linking by Reading Entity Des...

izuna385
November 02, 2019

Zero-Shot Entity Linking by Reading Entity Descriptions

supplement slides.
Other EL paper slides and summaries:
https://github.com/izuna385/EntityLinking_RecentTrend

izuna385

November 02, 2019
Tweet

More Decks by izuna385

Other Decks in Technology

Transcript

  1. 2/31 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2.

    Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation
  2. 3/31 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare

    Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. mention-entity cross attention is not considered
  3. 4/31 A. In-domain limited EL Problems 4 • Although recent

    Wikipedia-based EL researches saw great success, these successes were, partly due to  massive mention-entity pair (1B~)  Substantial alias table for candidate generation. Alias table is also created from abundant hyperlinks. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”
  4. 5/31 Bronchopulmonary Dysplasia was first described by Northway as a

    lung injury. B. Mention-entity cross attention was not considered.  mention/context encoding Mention Encoder mention    candidate entity generation for one mention  predict entity by score function  • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities  encode candidate entities using its descriptions, structures, etc. Entity Encoder 
  5. 6/31 Bronchopulmonary Dysplasia was first described by Northway as a

    lung injury. B. Mention-entity cross attention was not considered.  mention/context encoding Mention Encoder mention    candidate entity generation for one mention  predict entity by score function  • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities  encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored. 
  6. 7/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is

    crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction
  7. 8/31 Pre-assumption ① : Entity dictionary • They first presupposes

    only entity dictionary. : its descriptions : entity
  8. 9/31 Pre-assumption ② : Worlds( W ) • Each world

    W has its own : its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by
  9. 10/31 Pre-assumption ② : Worlds( W ) • Each world

    W has its own : its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by constructed from pages
  10. 11/31 Pre-assumption ② : Worlds( W ) = Worlds( W

    ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug
  11. 12/31 Pre-assumption ② : Worlds( W ) = Worlds( W

    ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from
  12. 13/31 Pre-assumption ② : Worlds( W ) : its descriptions

    : entity   : documents belonging to W  : labeled spans in , annotated by … …
  13. 14/31 Pre-assumption ② : Worlds( W ) : its descriptions

    : entity   : documents belonging to W  : labeled spans in , annotated by … … This is for “Entity Linking”
  14. 15/31 … … Pre-assumption ② : Worlds( W ) :

    its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.
  15. 16/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target
  16. 17/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target
  17. 18/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target
  18. 21/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is

    crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction
  19. 22/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location
  20. 23/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector
  21. 24/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context
  22. 25/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same
  23. 26/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention
  24. 27/31 (B) Context-description interaction model • For each generated candidate

    entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring
  25. 29/31 (A): Is DAP strategy effective for DA? Coronation street

    Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.
  26. 31/31 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description

    attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction