Distant Learning for Entity Linkingwith Automatic Noise Detection

Slide 1

Slide 1 text

1/20 1 Distant Learning for Entity Linking with Automatic Noise Detection @izuna385

Slide 2

Slide 2 text

2/20 Entity Linking • Link mention to specific entity in Knowledge Base 2 Beam, Andrew L., et al. "Clinical Concept Embeddings Learned from Massive Sources of Medical Data." arXiv:1804.01486 (2018). entity Knowledge Base

Slide 3

Slide 3 text

3/20 Procedure 3 1. Prepare Mention/Context vector 2. Learn/prepare Entity(inKB) representation 3. Candidate generation 4. Linking

Slide 4

Slide 4 text

4/20 Procedure 4 1. Prepare Mention/Context vector 2. Learn/prepare Entity(inKB) representation 3. Candidate generation 4. Linking Large amount of labeled data Wikipedia-hyperlink based alias table

Slide 5

Slide 5 text

5/20 Available: Unlabeled(no gold) text + KB

Slide 6

Slide 6 text

6/20

Slide 7

Slide 7 text

7/20 : scorer for linking : loss for linker expecting high link score from bag in which possibly exists gold entity expecting low link score from negative-sampled bag

Slide 8

Slide 8 text

8/20 Under “Supervised” settings If candidate generation fail to get gold entity, we can simply add gold entity to bag. ← Shikhar et al., ACL’18

Slide 9

Slide 9 text

9/20 Under “Distant” settings We can’t know whether candidate bag has gold entity or not. But for training g with valid data point, we want to know/classify this.

Slide 10

Slide 10 text

10/20 : Representation for E+ bag Again, expecting g puts high score to gold(or near) entity

Slide 11

Slide 11 text

11/20 Noisy/Valid E+ classifier E+ bag rep. Contextualized mention pN : Classify whether bag for mention is ‘noisy’ or ‘valid’ 1 0

Slide 12

Slide 12 text

12/20 Noisy/Valid E+ classifier E+ bag rep. Contextualized mention pN : Classify whether bag for mention is ‘noisy’ or ‘valid’ 1 0 NOTE: pN doesn’t have inputs of mention-candidate surface sim.

Slide 13

Slide 13 text

13/20 P2. LEFT

Slide 14

Slide 14 text

14/20 Loss for training pN (noisy/valid bag classifier) with linker valid(not noisy) prob. link loss For possibly valid(= gold entity exists) bag, sum up link loss for training linker, but… 1 0

Slide 15

Slide 15 text

15/20 Loss for training pN (noisy/valid bag classifier) with linker valid(not noisy) prob. link loss assigning ‘noisy’ to all bags easily lead loss to 0, so we can’t train linker and bag classifier. 1 0

Slide 16

Slide 16 text

16/20 Loss for training pN (noisy/valid bag classifier) with linker valid(not noisy) prob. link loss : Hyperparameter: beliefs about noisy data points. (e.g. 0.9) noisiness mean val. for Document 1 0

Slide 17

Slide 17 text

17/20 Loss for training pN (noisy/valid bag classifier) with linker valid(not noisy) prob. link loss : Hyperparameter: beliefs about noisy data points. (e.g. 0.9) noisiness mean val. for Document expect training linker with gold-entity-highly-possibly-exists data ↑ by adding this loss 1 0

Slide 18

Slide 18 text

18/20 1 0

Slide 19

Slide 19 text

19/20 Table3: Linker error rate for dev set Blue: denoising succeeded Red: denoising failure, due to flaw of candidate generation ND: denoising bags for training linker = succeeded at catching the signal of gold entity in bag

Slide 20

Slide 20 text

20/20 confirming pN separates valid/noisy data : bag in which gold entity doesn’t exist. Figure 3: