Zero-shot Entity Linking with Dense Entity Retrieval (Unofficial slides) and Entity Linking future directions

1/42 1 Zero-Shot Entity Linking with Dense Entity Retrieval unofficial
slides by @izuna385

2/42 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2.
Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation

3/42 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare
Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited

Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. Only surface-based candidate generation

Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited C. mention-entity cross attention is not considered B. Only surface-based candidate generation

6/42 A. In-domain limited EL Problems 6 • Wikipedia-based EL
successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation.

7/42 A. In-domain limited EL Problems 7 • Wikipedia-based EL
successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”

8/42 B. Surface-based candidate generation • Generation Failure examples Mention
in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Abbreviation

in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention

in document: “ALL” Generated Candidates: "All Sites", "All of the Time", “Alleviation” Gold entity: “Acute lymphocytic leukemia" Mention in document: “Giα” Generated Candidates: "Gin", "Gibraltar", “Gill structure” Gold entity: “GTP-Binding Protein alpha Subunit, Gi" Abbreviation Common name() mention • Mention’s orthographical variants().

11/42 Bronchopulmonary Dysplasia was first described by Northway as a
lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder

lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder Fixed vector comparison.

lung injury. C. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored.

14/42 Baselines / Their contributions • Baseline Zero-Shot Entity Linking
by Reading Entity Descriptions [Logeswaran et al., ACL’19]

by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG

by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall.

by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later)

by Reading Entity Descriptions [Logeswaran et al., ACL’19] • Main contribution Logeswaran et al. used surface-based CG à Change this to emb.-search and show higher recall. • Sub contribution Logeswaran et al. used slow cross-encoder. (details in later) à Compare this with fast bi-encoder [Humeau et al., ICLR’20 poster].

19/42 Encoder structure (A.) Bi-encoder [Humeau et al., ICLR’20] [CLS]
[CLS] [CLS] Mention

[CLS] [CLS] Mention Entity Caching for fast search.

[CLS] [CLS] Caching for fast search. can’t consider cross-attention. Entity

22/42 Encoder structure (B.) Cross-Encoder • For each generated candidate
entity per mention, consider mention-entity cross attention. [Devlin et al., ‘18] [Logeswaran et al., ACL’19]

entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT candidate entity descriptions

entity per mention, [CLS] mention context [ENT] candidate entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring

entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Considering mention-entity cross attention. candidate entity descriptions

entity per mention, [CLS] mention context [ENT] input : [Devlin et al., ‘18] L : embedding for indicating mention location [Logeswaran et al., ACL’19] ENT [CLS] scoring Slow inference per each mention and its candidates. Considering mention-entity cross attention. candidate entity descriptions

27/42 Optimization and Evaluation • Optimization : Based on gold
/ random negative sampling

/ random negative sampling • Evaluation Recall@64 : is gold available @ top64 scored entities? Accuracy : is top1 scored entity gold?

/ random negative sampling • Evaluation Recall@64 : is gold available @ top64 scored entities? Accuracy : is top1 scored entity gold? Normalized acc. : evaluation only mentions which succeeded in CG.

30/42 Result (1) BM25 vs. Bi-encoder brute-force • @Zero-shot dataset.
cross-encoder bi-encoder + BT BT : Brute-force search

31/42 Result (1) BM25 vs. Bi-encoder brute-force Both used cross-encoder.
cross-encoder bi-encoder + BT • @Zero-shot dataset. BT : Brute-force search

32/42 Result (2) Normalized acc. evaluation. Both used cross-encoder. •
@Zero-shot dataset.

33/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. •
Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider

34/42 Result (3) Bi-encoder vs. cross-encoder • @TAC-KBP10 dataset. •
Bi-encoder : fast but can’t consider mention-entity cross attention. Cross-encoder : slow but consider cross-encoder + BT bi-encoder + BT simple-encoder + BT BT : Brute-force search

35/42 Conclusions • Fast and scalable EL model for New/General
Domain. • Even cross-att. is removed, fast EL model has good acc.

36/42 Entity Linking future directions(1) • Distant / No-label situations.

37/42 Entity Linking future directions(1) • Distant / No-label situations.
[Le and Titov, ACL’19a] Surface-match + Multi-instance Learning. [Le and Titov, ACL’19b] Spacy + + Wikipedia hyperlinks edge/statistics

38/42 Entity Linking future directions(2) • Improving Entity representations.

39/42 Entity Linking future directions(2) • Improving Entity representations. Yes
Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20]

40/42 Entity Linking future directions(2) • Improving Entity representations. Yes
Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Indirectly annotated data used) KEPLER [Wang et al., ‘Nov 19] No Yes No DEER [Gillick et al., CoNLL ’19] ERNIE [Zhang et al., ACL ’19] BertEnt [Yamada et al., ’19] EntEval [Chen et al., EMNLP’19] WKLM [Xiong et al., ICLR’20] • Various evaluation metrics exist. Entity Typing, Entity disambiguation, Fact completion, QA, …

41/42 Entity Linking future directions(3) • No needs for entity
descriptions? [Chen et al., EMNLP’19] [Chen et al., EMNLP’19] introduced 8 entity-evaluation tasks. • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. …

descriptions?[Chen et al., EMNLP’19] • Rare : Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities.

44/42 Supplementation 44

45/42 EntEval 8 tasks [Chen et al., EMNLP’19] • Rare
: Rare entity prediction(Cloze task) in documents. • CoNLL : Named entity disambiguation. • ERT : Relation typing between two entities. • ET : Entity Typing • ESR : Entity Similarity and Relatedness • CAP : Coreference Arc Prediction • EFP : Entity Factuality Prediction • CERP : Contextualized Entity Relationship Prediction

46/42 [Logeswaran et al., ACL’19]’s Contributions Proposing Zero-shot EL Showing
context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

47/42 Pre-assumption ① : Entity dictionary • They first presupposes
only entity dictionary. : its descriptions : entity

48/42 Pre-assumption ② : Worlds( W ) • Each world
W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by

49/42 Pre-assumption ② : Worlds( W ) • Each world
W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by constructed from pages

50/42 Zero-shot EL datasets [Logeswaran et al., ACL’19] • Each
world W constructed from W‘s Wikia.

51/42 Pre-assumption ② : Worlds( W ) = Worlds( W
) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug

52/42 Pre-assumption ② : Worlds( W ) = Worlds( W
) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from

53/42 Pre-assumption ② : Worlds( W ) : its descriptions
: entity : documents belonging to W : labeled spans in , annotated by … …

54/42 Pre-assumption ② : Worlds( W ) : its descriptions
: entity : documents belonging to W : labeled spans in , annotated by … … This is for “Entity Linking”

55/42 … … Pre-assumption ② : Worlds( W ) :
its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.

56/42 Previous pretraining LM vs DA pretraining LM • Task-adaptive
pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target

pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target

pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target

59/42 When and Why DAP?

60/42 How to prepare src/tgt corpus for fine-tuning LM?

61/42 Their Contributions Proposing Zero-shot EL Showing context-description attention is
crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

62/42 (B) Context-description interaction model • For each generated candidate
entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location

entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector

entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context

entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same

entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention

entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring

68/42 Result for changing resource for pretraining LM • NOTE:
this is not DAP.

69/42 (A): Is DAP strategy effective for DA? Coronation street
Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.

70/42 (B): Is mention - entity description attention powerful? Mention
entity cross-attention is effective.

71/42 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description
attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Zero-shot Entity Linking with Dense Entity Retr...

Zero-shot Entity Linking with Dense Entity Retrieval (Unofficial slides) and Entity Linking future directions

More Decks by izuna385

Other Decks in Research

Featured

Transcript