Zero-Shot Entity Linking by Reading Entity Descriptions

1/31 1 Zero-Shot Entity Linking by Reading Entity Descriptions (supplement
slides) slides by @izuna385 Unofficial

2/31 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2.
Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation

3/31 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare
Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. mention-entity cross attention is not considered

4/31 A. In-domain limited EL Problems 4 • Although recent
Wikipedia-based EL researches saw great success, these successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation. Alias table is also created from abundant hyperlinks. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”

5/31 Bronchopulmonary Dysplasia was first described by Northway as a
lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder

6/31 Bronchopulmonary Dysplasia was first described by Northway as a
lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored.

7/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is
crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

8/31 Pre-assumption ① : Entity dictionary • They first presupposes
only entity dictionary. : its descriptions : entity

9/31 Pre-assumption ② : Worlds( W ) • Each world
W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by

10/31 Pre-assumption ② : Worlds( W ) • Each world
W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by constructed from pages

11/31 Pre-assumption ② : Worlds( W ) = Worlds( W
) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug

12/31 Pre-assumption ② : Worlds( W ) = Worlds( W
) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from

13/31 Pre-assumption ② : Worlds( W ) : its descriptions
: entity : documents belonging to W : labeled spans in , annotated by … …

14/31 Pre-assumption ② : Worlds( W ) : its descriptions
: entity : documents belonging to W : labeled spans in , annotated by … … This is for “Entity Linking”

15/31 … … Pre-assumption ② : Worlds( W ) :
its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.

16/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive
pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target

pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target

pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target

19/31 When and Why DAP?

20/31 How to prepare src/tgt corpus for fine-tuning LM?

21/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is
crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

22/31 (B) Context-description interaction model • For each generated candidate
entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location

entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector

entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context

entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same

entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention

entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring

28/31 Result for changing resource for pretraining LM • NOTE:
this is not DAP.

29/31 (A): Is DAP strategy effective for DA? Coronation street
Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.

30/31 (B): Is mention - entity description attention powerful? Mention
entity cross-attention is effective.

31/31 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description
attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Zero-Shot Entity Linking by Reading Entity Des...

Zero-Shot Entity Linking by Reading Entity Descriptions

izuna385

More Decks by izuna385

Other Decks in Technology

Featured

Transcript

1/31 1 Zero-Shot Entity Linking by Reading Entity Descriptions (supplement

2/31 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2.

3/31 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare

4/31 A. In-domain limited EL Problems 4 • Although recent

5/31 Bronchopulmonary Dysplasia was first described by Northway as a

6/31 Bronchopulmonary Dysplasia was first described by Northway as a

7/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is

8/31 Pre-assumption ① : Entity dictionary • They first presupposes

9/31 Pre-assumption ② : Worlds( W ) • Each world

10/31 Pre-assumption ② : Worlds( W ) • Each world

11/31 Pre-assumption ② : Worlds( W ) = Worlds( W

12/31 Pre-assumption ② : Worlds( W ) = Worlds( W

13/31 Pre-assumption ② : Worlds( W ) : its descriptions

14/31 Pre-assumption ② : Worlds( W ) : its descriptions

15/31 … … Pre-assumption ② : Worlds( W ) :

16/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

17/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

18/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive

19/31 When and Why DAP?

20/31 How to prepare src/tgt corpus for fine-tuning LM?

21/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is

22/31 (B) Context-description interaction model • For each generated candidate

23/31 (B) Context-description interaction model • For each generated candidate

24/31 (B) Context-description interaction model • For each generated candidate

25/31 (B) Context-description interaction model • For each generated candidate

26/31 (B) Context-description interaction model • For each generated candidate

27/31 (B) Context-description interaction model • For each generated candidate

28/31 Result for changing resource for pretraining LM • NOTE:

29/31 (A): Is DAP strategy effective for DA? Coronation street

30/31 (B): Is mention - entity description attention powerful? Mention

31/31 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description