Wikipedia-based EL researches saw great success, these successes were, partly due to massive mention-entity pair (1B~) Substantial alias table for candidate generation. Alias table is also created from abundant hyperlinks. • Under specific domains, these annotations are limited and expensive. • “Therefore, we need entity linking systems that can generalize to unseen specialized entities.”
lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities.(See ) Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions, structures, etc. Entity Encoder
lung injury. B. Mention-entity cross attention was not considered. mention/context encoding Mention Encoder mention candidate entity generation for one mention predict entity by score function • Previous : encoded mention vs encoded candidate entities. Dysplasia Pulmonary BPdysplasia … candidate entities encode candidate entities using its descriptions , structures, etc. Entity Encoder mention–description interaction was ignored.
its descriptions : entity : documents belonging to W : labeled spans in , annotated by … … down-sampled down-sampled Another documents are preserved as corpus for Domain-adaptive pre-training.
pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) LM : Language Model DA: Domain adaptive src : source tgt : target
pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) LM : Language Model DA: Domain adaptive src : source tgt : target
pretraining Learning with src + tgt corpus à finetune with src corpus for solving specific task.(e.g. NER) (tgt corpus supposed to be small.) • Open-corpus pre-training Learning with massive src + tgt corpus. (e.g. ELMo, BERT, SciBERT,…) • Domain-adaptive pre-training(DAP) (proposed) pre-trained only on the tgt corpus. LM : Language Model DA: Domain adaptive src : source tgt : target
entity per mention, (i)Full-transformer model (proposed) [CLS] mention context [SEP] entity descriptions input : [Devlin et al., ‘18] L : embedding for indicating mention location
entity per mention, (ii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context input is same
entity per mention, (iii)Cand-Pool-transformer model (for comparison) [Devlin et al., ‘18] [CLS] [CLS] [CLS]entity descriptions [SEP] [SEP] mention context Using d att to mention
attention is crucial for EL. Proposing DA-pretrain for EL. (Details are later described.) (A) for in-domain limited EL, (B) for mention-entity interaction