Zero-shot Entity Linking with Dense Entity Retrieval (Unofficial slides) and Entity Linking future directions

Slide 1

Slide 1 text

1/42 1 Zero-Shot Entity Linking with Dense Entity Retrieval unofficial slides by @izuna385

Slide 2

Slide 2 text

2/42 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2. Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation

Slide 3

Slide 3 text

3/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited

Slide 4

Slide 4 text

4/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. Only surface-based candidate generation

Slide 5

Slide 5 text

5/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited C. mention-entity cross attention is not considered B. Only surface-based candidate generation

Slide 6

Slide 6 text

6/42 A. In-domain limited EL Problems 6 • Wikipedia-based EL successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation.

Slide 7

Slide 7 text

7/42 A. In-domain limited EL Problems 7 • Wikipedia-based EL successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”

Slide 8

Slide 8 text

8/42 B. Surface-based candidate generation • Generation Failure examples Mention in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Abbreviation

Slide 9

Slide 9 text

9/42 B. Surface-based candidate generation • Generation Failure examples Mention in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention

Slide 10

Slide 10 text

10/42 B. Surface-based candidate generation • Generation Failure examples Mention in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention • Mention’s orthographical variants().

Slide 11

Slide 11 text

11/42 Bronchopulmonary Dysplasia was first described by Northway as a lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder

Slide 12

Slide 12 text

12/42 Bronchopulmonary Dysplasia was first described by Northway as a lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder Fixed vector comparison.

Slide 13

Slide 13 text

13/42 Bronchopulmonary Dysplasia was first described by Northway as a lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored.

Slide 14

Slide 14 text

14/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking by Reading Entity Descriptions [Logeswaran et al., ACL’19]

Slide 15

Slide 15 text

15/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG

Slide 16

Slide 16 text

16/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall.

Slide 17

Slide 17 text

17/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later)

Slide 18

Slide 18 text

18/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later) à Compare this with fast bi-encoder [Humeau et al., ICLR’20 poster].

Slide 19

Slide 19 text

19/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS] [CLS] [CLS] Mention

Slide 20

Slide 20 text

20/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS] [CLS] [CLS] Mention Entity Caching for fast search.

Slide 21

Slide 21 text

21/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS] [CLS] [CLS] Caching for fast search. can’t consider cross-attention. Entity

Slide 22

Slide 22 text

22/42 Encoder structure (B.) Cross-Encoder • For each generated candidate entity per mention, consider mention-entity cross attention. [Devlin et al., ‘18] [Logeswaran et al., ACL’19]

Slide 23

Slide 23 text

23/42 Encoder structure (B.) Cross-Encoder • For each generated candidate entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT candidate entity descriptions

Slide 24

Slide 24 text

24/42 Encoder structure (B.) Cross-Encoder • For each generated candidate entity per mention, [CLS] mention context [ENT] candidate entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring

Slide 25

Slide 25 text

25/42 Encoder structure (B.) Cross-Encoder • For each generated candidate entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Considering mention-entity cross attention. candidate entity descriptions

Slide 26

Slide 26 text

26/42 Encoder structure (B.) Cross-Encoder • For each generated candidate entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Slow inference per each mention and its candidates. Considering mention-entity cross attention. candidate entity descriptions

Slide 27

Slide 27 text

27/42 Optimization and Evaluation • Optimization : Based on gold / random negative sampling

Slide 28

Slide 28 text

28/42 Optimization and Evaluation • Optimization : Based on gold / random negative sampling • Evaluation Recall@64 : is gold available @ top64 scored entities? Accuracy : is top1 scored entity gold?

Slide 29

Slide 29 text

29/42 Optimization and Evaluation • Optimization : Based on gold / random negative sampling • Evaluation Recall@64 : is gold available @ top64 scored entities? Accuracy : is top1 scored entity gold? Normalized acc. : evaluation only mentions which succeeded in CG.

Slide 30

Slide 30 text

30/42 Result (1) BM25 vs. Bi-encoder brute-force • @Zero-shot dataset. cross-encoder bi-encoder + BT BT : Brute-force search

Slide 31

Slide 31 text

31/42 Result (1) BM25 vs. Bi-encoder brute-force Both used cross-encoder. cross-encoder bi-encoder + BT • @Zero-shot dataset. BT : Brute-force search

Slide 32

Slide 32 text

32/42 Result (2) Normalized acc. evaluation. Both used cross-encoder. • @Zero-shot dataset.

Slide 33

Slide 33 text

33/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. • Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider

Slide 34

Slide 34 text

34/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. • Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider cross-encoder + BT bi-encoder + BT simple-encoder + BT BT : Brute-force search

Slide 35

Slide 35 text

35/42 Conclusions • Fast and scalable EL model for New/General Domain. • Even cross-att. is removed, fast EL model has good acc.

Slide 36

Slide 36 text

36/42 Entity Linking future directions(1) • Distant / No-label situations.

Slide 37

Slide 37 text

37/42 Entity Linking future directions(1) • Distant / No-label situations. [Le and Titov, ACL’19a] Surface-match + Multi-instance Learning. [Le and Titov, ACL’19b] Spacy + + Wikipedia hyperlinks edge/statistics

Slide 38

Slide 38 text

38/42 Entity Linking future directions(2) • Improving Entity representations.

Slide 39

Slide 39 text

39/42 Entity Linking future directions(2) • Improving Entity representations. Yes Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20]

Slide 40

Slide 40 text

40/42 Entity Linking future directions(2) • Improving Entity representations. Yes Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20] • Various evaluation metrics exist. Entity Typing, Entity disambiguation, Fact completion, QA, …

Slide 41

Slide 41 text

41/42 Entity Linking future directions(3) • No needs for entity descriptions? [Chen et al., EMNLP’19] [Chen et al., EMNLP’19] introduced 8 entity-evaluation tasks. • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. …

Slide 42

Slide 42 text

42/42 Entity Linking future directions(3) • No needs for entity descriptions?[Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities.

Slide 43

Slide 43 text

43/42 Entity Linking future directions(3) • No needs for entity descriptions?[Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities.

Slide 44

Slide 44 text

44/42 Supplementation 44

Slide 45

Slide 45 text

45/42 EntEval 8 tasks [Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. • ET : Entity Typing • ESR : Entity Similarity and Relatedness • CAP : Coreference Arc Prediction • EFP : Entity Factuality Prediction • CERP : Contextualized Entity Relationship Prediction

Slide 46

Slide 46 text

46/42 [Logeswaran et al., ACL’19]’s Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Slide 47

Slide 47 text

47/42 Pre-assumption ① : Entity dictionary • They first presupposes only entity dictionary. : its descriptions : entity

Slide 48

Slide 48 text

48/42 Pre-assumption ② : Worlds( W ) • Each world W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by

Slide 49

Slide 49 text

49/42 Pre-assumption ② : Worlds( W ) • Each world W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by constructed from pages

Slide 50

Slide 50 text

50/42 Zero-shot EL datasets [Logeswaran et al., ACL’19] • Each world W constructed from W‘s Wikia.

Slide 51

Slide 51 text

51/42 Pre-assumption ② : Worlds( W ) = Worlds( W ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug

Slide 52

Slide 52 text

52/42 Pre-assumption ② : Worlds( W ) = Worlds( W ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from

Slide 53

Slide 53 text

53/42 Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … …

Slide 54

Slide 54 text

54/42 Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … This is for “Entity Linking”

Slide 55

Slide 55 text

55/42 … … Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.

Slide 56

Slide 56 text

56/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 57

Slide 57 text

57/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 58

Slide 58 text

58/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 59

Slide 59 text

59/42 When and Why DAP?

Slide 60

Slide 60 text

60/42 How to prepare src/tgt corpus for fine-tuning LM?

Slide 61

Slide 61 text

61/42 Their Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Slide 62

Slide 62 text

62/42 (B) Context-description interaction model • For each generated candidate entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location

Slide 63

Slide 63 text

63/42 (B) Context-description interaction model • For each generated candidate entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector

Slide 64

Slide 64 text

64/42 (B) Context-description interaction model • For each generated candidate entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context

Slide 65

Slide 65 text

65/42 (B) Context-description interaction model • For each generated candidate entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same

Slide 66

Slide 66 text

66/42 (B) Context-description interaction model • For each generated candidate entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention

Slide 67

Slide 67 text

67/42 (B) Context-description interaction model • For each generated candidate entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring

Slide 68

Slide 68 text

68/42 Result for changing resource for pretraining LM • NOTE: this is not DAP.

Slide 69

Slide 69 text

69/42 (A): Is DAP strategy effective for DA? Coronation street Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.

Slide 70

Slide 70 text

70/42 (B): Is mention - entity description attention powerful? Mention entity cross-attention is effective.

Slide 71

Slide 71 text

71/42 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction