Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Zero-shot Entity Linking with Dense Entity Retr...

izuna385
January 22, 2020

Zero-shot Entity Linking with Dense Entity Retrieval (Unofficial slides) and Entity Linking future directions

supplement for journal club
Entity Linking future directions are also listed.

izuna385

January 22, 2020
Tweet

More Decks by izuna385

Other Decks in Research

Transcript

  1. 2/42 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2.

    Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation
  2. 3/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare

    Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited
  3. 4/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare

    Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. Only surface-based candidate generation
  4. 5/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare

    Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited C. mention-entity cross attention is not considered B. Only surface-based candidate generation
  5. 6/42 A. In-domain limited EL Problems 6 • Wikipedia-based EL

    successes were, partly due to  massive mention-entity pair (1B~)  Substantial alias table for candidate generation.
  6. 7/42 A. In-domain limited EL Problems 7 • Wikipedia-based EL

    successes were, partly due to  massive mention-entity pair (1B~)  Substantial alias table for candidate generation. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”
  7. 8/42 B. Surface-based candidate generation • Generation Failure examples Mention

    in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Abbreviation
  8. 9/42 B. Surface-based candidate generation • Generation Failure examples Mention

    in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention
  9. 10/42 B. Surface-based candidate generation • Generation Failure examples Mention

    in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention • Mention’s orthographical variants().
  10. 11/42 Bronchopulmonary Dysplasia was first described by Northway as a

    lung injury. C. Mention-entity cross attention was not considered.  mention/context encoding Mention Encoder mention    candidate entity generation for one mention  predict entity by score function  • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities  encode candidate entities using its descriptions, structures, etc. Entity Encoder 
  11. 12/42 Bronchopulmonary Dysplasia was first described by Northway as a

    lung injury. C. Mention-entity cross attention was not considered.  mention/context encoding Mention Encoder mention    candidate entity generation for one mention  predict entity by score function  • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities  encode candidate entities using its descriptions, structures, etc. Entity Encoder  Fixed vector comparison.
  12. 13/42 Bronchopulmonary Dysplasia was first described by Northway as a

    lung injury. C. Mention-entity cross attention was not considered.  mention/context encoding Mention Encoder mention    candidate entity generation for one mention  predict entity by score function  • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities  encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored. 
  13. 14/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking

    by Reading Entity Descriptions [Logeswaran et al., ACL’19]
  14. 15/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking

    by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG
  15. 16/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking

    by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall.
  16. 17/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking

    by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later)
  17. 18/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking

    by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later) à Compare this with fast bi-encoder [Humeau et al., ICLR’20 poster].
  18. 20/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS]

    [CLS] [CLS] Mention Entity Caching for fast search.
  19. 21/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS]

    [CLS] [CLS] Caching for fast search. can’t consider cross-attention. Entity
  20. 22/42 Encoder structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, consider mention-entity cross attention. [Devlin et al., ‘18] [Logeswaran et al., ACL’19]
  21. 23/42 Encoder structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT candidate entity descriptions
  22. 24/42 Encoder structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] candidate entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring
  23. 25/42 Encoder structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Considering mention-entity cross attention. candidate entity descriptions
  24. 26/42 Encoder structure (B.) Cross-Encoder • For each generated candidate

    entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Slow inference per each mention and its candidates. Considering mention-entity cross attention. candidate entity descriptions
  25. 28/42 Optimization and Evaluation • Optimization : Based on gold

    / random negative sampling • Evaluation  Recall@64 : is gold available @ top64 scored entities?  Accuracy : is top1 scored entity gold?
  26. 29/42 Optimization and Evaluation • Optimization : Based on gold

    / random negative sampling • Evaluation  Recall@64 : is gold available @ top64 scored entities?  Accuracy : is top1 scored entity gold?  Normalized acc. : evaluation only mentions which succeeded in CG.
  27. 30/42 Result (1) BM25 vs. Bi-encoder brute-force • @Zero-shot dataset.

    cross-encoder bi-encoder + BT BT : Brute-force search
  28. 31/42 Result (1) BM25 vs. Bi-encoder brute-force Both used cross-encoder.

    cross-encoder bi-encoder + BT • @Zero-shot dataset. BT : Brute-force search
  29. 33/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. •

    Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider 
  30. 34/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. •

    Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider  cross-encoder + BT bi-encoder + BT simple-encoder + BT BT : Brute-force search
  31. 35/42 Conclusions • Fast and scalable EL model for New/General

    Domain. • Even cross-att. is removed, fast EL model has good acc.
  32. 37/42 Entity Linking future directions(1) • Distant / No-label situations.

    [Le and Titov, ACL’19a] Surface-match + Multi-instance Learning. [Le and Titov, ACL’19b] Spacy +  + Wikipedia hyperlinks edge/statistics
  33. 39/42 Entity Linking future directions(2) • Improving Entity representations. Yes

    Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20] 
  34. 40/42 Entity Linking future directions(2) • Improving Entity representations. Yes

    Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20]  • Various evaluation metrics exist. Entity Typing, Entity disambiguation, Fact completion, QA, …
  35. 41/42 Entity Linking future directions(3) • No needs for entity

    descriptions? [Chen et al., EMNLP’19] [Chen et al., EMNLP’19] introduced 8 entity-evaluation tasks. • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. …
  36. 42/42 Entity Linking future directions(3) • No needs for entity

    descriptions?[Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities.
  37. 43/42 Entity Linking future directions(3) • No needs for entity

    descriptions?[Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities.
  38. 45/42 EntEval 8 tasks [Chen et al., EMNLP’19] • Rare

    : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. • ET : Entity Typing • ESR : Entity Similarity and Relatedness • CAP : Coreference Arc Prediction • EFP : Entity Factuality Prediction • CERP : Contextualized Entity Relationship Prediction
  39. 46/42 [Logeswaran et al., ACL’19]’s Contributions Proposing Zero-shot EL Showing

    context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction
  40. 47/42 Pre-assumption ① : Entity dictionary • They first presupposes

    only entity dictionary. : its descriptions : entity
  41. 48/42 Pre-assumption ② : Worlds( W ) • Each world

    W has its own : its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by
  42. 49/42 Pre-assumption ② : Worlds( W ) • Each world

    W has its own : its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by constructed from pages
  43. 51/42 Pre-assumption ② : Worlds( W ) = Worlds( W

    ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug
  44. 52/42 Pre-assumption ② : Worlds( W ) = Worlds( W

    ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from
  45. 53/42 Pre-assumption ② : Worlds( W ) : its descriptions

    : entity   : documents belonging to W  : labeled spans in , annotated by … …
  46. 54/42 Pre-assumption ② : Worlds( W ) : its descriptions

    : entity   : documents belonging to W  : labeled spans in , annotated by … … This is for “Entity Linking”
  47. 55/42 … … Pre-assumption ② : Worlds( W ) :

    its descriptions : entity   : documents belonging to W  : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.
  48. 56/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target
  49. 57/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target
  50. 58/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive

    pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target
  51. 61/42 Their Contributions Proposing Zero-shot EL Showing context-description attention is

    crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction
  52. 62/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location
  53. 63/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector
  54. 64/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context
  55. 65/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same
  56. 66/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention
  57. 67/42 (B) Context-description interaction model • For each generated candidate

    entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring
  58. 69/42 (A): Is DAP strategy effective for DA? Coronation street

    Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.
  59. 71/42 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description

    attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction