1/20
1
Distant Learning for Entity Linking
with Automatic Noise Detection
@izuna385
Slide 2
Slide 2 text
2/20
Entity Linking
• Link mention to specific entity in Knowledge Base
2
Beam, Andrew L., et al. "Clinical Concept Embeddings Learned
from Massive Sources of Medical Data." arXiv:1804.01486 (2018).
entity
Knowledge Base
4/20
Procedure
4
1. Prepare Mention/Context vector
2. Learn/prepare
Entity(inKB)
representation
3. Candidate generation
4. Linking
Large amount of labeled data
Wikipedia-hyperlink based
alias table
Slide 5
Slide 5 text
5/20
Available:
Unlabeled(no gold) text + KB
Slide 6
Slide 6 text
6/20
Slide 7
Slide 7 text
7/20
: scorer for linking
: loss for linker
expecting high link score from
bag in which
possibly exists gold entity
expecting low link score from
negative-sampled bag
Slide 8
Slide 8 text
8/20
Under “Supervised” settings
If candidate generation fail to get gold entity,
we can simply add gold entity to bag. ←
Shikhar et al., ACL’18
Slide 9
Slide 9 text
9/20
Under “Distant” settings
We can’t know whether
candidate bag has gold entity or not.
But for training g with valid data point,
we want to know/classify this.
Slide 10
Slide 10 text
10/20
:
Representation for E+ bag
Again, expecting g puts high score to gold(or near) entity
Slide 11
Slide 11 text
11/20
Noisy/Valid E+ classifier
E+ bag rep.
Contextualized mention
pN
: Classify whether bag for mention is ‘noisy’ or ‘valid’
1
0
Slide 12
Slide 12 text
12/20
Noisy/Valid E+ classifier
E+ bag rep.
Contextualized mention
pN
: Classify whether bag for mention is ‘noisy’ or ‘valid’
1
0
NOTE: pN
doesn’t have inputs of mention-candidate surface sim.
Slide 13
Slide 13 text
13/20
P2. LEFT
Slide 14
Slide 14 text
14/20
Loss for training pN
(noisy/valid bag classifier) with linker
valid(not noisy) prob. link loss
For possibly valid(= gold entity exists) bag,
sum up link loss for training linker, but…
1
0
Slide 15
Slide 15 text
15/20
Loss for training pN
(noisy/valid bag classifier) with linker
valid(not noisy) prob. link loss
assigning ‘noisy’ to all bags
easily lead loss to 0,
so we can’t train linker and bag classifier.
1
0
Slide 16
Slide 16 text
16/20
Loss for training pN
(noisy/valid bag classifier) with linker
valid(not noisy) prob. link loss
: Hyperparameter: beliefs about noisy data points.
(e.g. 0.9)
noisiness mean val. for Document
1
0
Slide 17
Slide 17 text
17/20
Loss for training pN
(noisy/valid bag classifier) with linker
valid(not noisy) prob. link loss
: Hyperparameter: beliefs about noisy data points.
(e.g. 0.9)
noisiness mean val. for Document
expect training linker with
gold-entity-highly-possibly-exists data
↑ by adding this loss
1
0
Slide 18
Slide 18 text
18/20
1
0
Slide 19
Slide 19 text
19/20
Table3: Linker error rate for dev set
Blue: denoising succeeded
Red: denoising failure, due to flaw of candidate generation
ND: denoising bags for training linker = succeeded at catching the signal of gold entity in bag
Slide 20
Slide 20 text
20/20
confirming pN
separates
valid/noisy data
: bag in which gold entity doesn’t exist.
Figure 3: