Slide 1

Slide 1 text

1/31 1 Zero-Shot Entity Linking by Reading Entity Descriptions (supplement slides) slides by @izuna385 Unofficial

Slide 2

Slide 2 text

2/31 Previous Entity Linking(EL) 2 1. Prepare Mention/Context vector 2. Candidate generation 3. Linking 0. Learn/prepare Entity(inKB) representation

Slide 3

Slide 3 text

3/31 Previous Entity Linking 1. Prepare Mention/Context vector 0. Learn/prepare Entity(inKB) representation 2. Candidate generation 3. Linking A. In-domain limited B. mention-entity cross attention is not considered

Slide 4

Slide 4 text

4/31 A. In-domain limited EL Problems 4 • Although recent Wikipedia-based EL researches saw great success, these successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation. Alias table is also created from abundant hyperlinks. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”

Slide 5

Slide 5 text

5/31 Bronchopulmonary Dysplasia was first described by Northway as a lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder

Slide 6

Slide 6 text

6/31 Bronchopulmonary Dysplasia was first described by Northway as a lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored.

Slide 7

Slide 7 text

7/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Slide 8

Slide 8 text

8/31 Pre-assumption ① : Entity dictionary • They first presupposes only entity dictionary. : its descriptions : entity

Slide 9

Slide 9 text

9/31 Pre-assumption ② : Worlds( W ) • Each world W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by

Slide 10

Slide 10 text

10/31 Pre-assumption ② : Worlds( W ) • Each world W has its own : its descriptions : entity : documents belonging to W : labeled spans in , annotated by constructed from pages

Slide 11

Slide 11 text

11/31 Pre-assumption ② : Worlds( W ) = Worlds( W ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug

Slide 12

Slide 12 text

12/31 Pre-assumption ② : Worlds( W ) = Worlds( W ) : its description : : entity : mention (documents) : constructed from collections meninblack.fandom.com/wiki/Frank_the_Pug constructed from

Slide 13

Slide 13 text

13/31 Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … …

Slide 14

Slide 14 text

14/31 Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … This is for “Entity Linking”

Slide 15

Slide 15 text

15/31 … … Pre-assumption ② : Worlds( W ) : its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.

Slide 16

Slide 16 text

16/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 17

Slide 17 text

17/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 18

Slide 18 text

18/31 Previous pretraining LM vs DA pretraining LM • Task-adaptive pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target

Slide 19

Slide 19 text

19/31 When and Why DAP?

Slide 20

Slide 20 text

20/31 How to prepare src/tgt corpus for fine-tuning LM?

Slide 21

Slide 21 text

21/31 Their Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction

Slide 22

Slide 22 text

22/31 (B) Context-description interaction model • For each generated candidate entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location

Slide 23

Slide 23 text

23/31 (B) Context-description interaction model • For each generated candidate entity per mention, (i)Full-transformer model (proposed) output : [Devlin et al., ‘18] [CLS](= ) • Scoring candidates( s) by : learned vector

Slide 24

Slide 24 text

24/31 (B) Context-description interaction model • For each generated candidate entity per mention, (ii)Pool-transformer model (for comparison) output : [Devlin et al., ‘18] [CLS](= ) Scoring [CLS](= ) [CLS] [CLS]entity descriptions [SEP] [SEP] mention context

Slide 25

Slide 25 text

25/31 (B) Context-description interaction model • For each generated candidate entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same

Slide 26

Slide 26 text

26/31 (B) Context-description interaction model • For each generated candidate entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention

Slide 27

Slide 27 text

27/31 (B) Context-description interaction model • For each generated candidate entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Ganea and Hofmann, ‘17] K : candidates per mention Scoring

Slide 28

Slide 28 text

28/31 Result for changing resource for pretraining LM • NOTE: this is not DAP.

Slide 29

Slide 29 text

29/31 (A): Is DAP strategy effective for DA? Coronation street Muppets Ice hockey Elder scrolls : Wikipedia + Book corpus : 8 worlds, apart from dev and test DAP is effective.

Slide 30

Slide 30 text

30/31 (B): Is mention - entity description attention powerful? Mention entity cross-attention is effective.

Slide 31

Slide 31 text

31/31 Conclusions / Their Contributions Proposing Zero-shot EL Showing context-description attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction