Entity representation with relational attention

1/47 1 Entity representation with relational attention izuna385 2019-11-26 1/42

2/47 • Not relying on massive unlabeled text, • by
leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Motivation (new) 2/42

3/47 • Massive Approach • Solving KGC Approach • Solving
KGC with text Approach How to get entity rep.? 3/42

4/47 Yes Massive approach Use massive text? Yes No Require
“entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × 4/42

5/47 Yes Massive approach Use massive text? Yes No Require
“entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × • Needs lots of mention-entity annotations 5/42

6/47 Yes Solving KGC(with text) approach Use massive text? Yes
No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Solving KGC approach Solving KGC + text approach 6/42

No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) 7/42

No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 8/42

No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … • Closed setting • Scaling problem not considered 9/42

10/47 KGC-based entity emb. problem • Only small-KB is considered,
under closed setting. • For example, [Nathani et al, EMNLP ’19] requires 2-hop attention. High computational cost, when apply to real KB.( ~ 1,000,000) • Under real KB, we can’t full-softmax backwards. [Balkır et al., EMNLP’19] 10/42

11/47 Yes Solving KGC with text approach Use massive text?
Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 11/42

12/47 • [Han et al., AAAI’18] It needs lot of
sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. 12/42

13/47 • [Han et al., AAAI’18] It needs lot of
sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. Not closed in KB. 13/42

14/47 Intuition • Why not use information from entity strings
and definitions itself? Definitions Relations (triplets) 14/42

15/47 Intuition • def. text + rel. --> def. text
[Xu, et al, IJCAI ’16] from KG from KG 15/42

16/47 Scoring Function • TuckER [Balazevic, et al, EMNLP ’19].
16/42

17/47 Baseline Model: DefinitionTuckER • Encoded definition sentence is used
as entity embedding. linear linear 17/42

18/47 Baseline Model: DefinitionTuckER Encoder Encoder • Encoded definition sentence
is used as entity embedding. Baseline: No interaction between entity and rel. 18/42

19/47 • Consider interaction under score function (too many) Previous
entity-relation interaction studies 19/42

20/47 • Consider interaction under score function (too many) •
How about interaction between entity itself and relation? Previous entity-relation interaction studies 20/42

21/47 Proposal Encoder Encoder • Encoded definition sentence is used
as entity embedding. Proposal: Relational attention to entity definition 21/42

22/47 Label Attention to sentence • [Wang et al., ACL’18]
sentence seq. sentence emb. sentence seq. label emb. matrix (learned) Label attended sentence emb. β: label-emb. attention. 22/42

23/47 Relation Attention to entity definition • [Wang et al.,
ACL’18] sentence seq. sentence emb. Entity definition sentence seq. label emb. matrix (learned) entity emb. Use Relation emb. matrix here ( ) Get relation-attended entity emb. 23/42

24/47 Training • In-batch negative sampling [Henderson et al., ‘17;
Gillick et al., ‘18] head def. tail def. rel. one batch … … … 24/42

25/47 Training • In-batch negative sampling [Henderson et al., ‘17;
Gillick et al., ‘18] head def. tail def. rel. one batch × gold negative negative … gold gold 25/42

26/47 • Dataset : DBPedia50k • Task : Predict (entity
def., + rel., --> ? ) in test data. • Evaluation : Standard KGC settings. Experiment 26/42

27/47 Experiment setting • Baseline : Definition encoder : ELMo
+ StackedAttLSTM. + • Proposal : Definition encoder : ELMo + StackedAttLSTM. + RelationAtt. + Attention sum Attention sum 27/42

28/47 Result@test Model Hits@1 Hits@10 Baseline 0.0 0.0 Baseline +
Rel att. 1.2 7.6 • Evaluation : 1-to-All entities • test data : 10969 triplets NOTE: • My experiment iteration is only 20. • For example, TuckER[Balazevic et al., ‘19] requires 500 iter. ConMask [Shi and Weinger, AAAI’18] 81.0 28/42

29/47 In-batch Evaluation • In-batch : 1-to-batch size classification head
def. tail def. rel. one batch × … score(softmaxed) 0.12 0.05 0.31 0.22 0.01 0.13 gold In this case, model failed to predict tail. 29/42

30/47 Result@test Model Hits@1 Baseline 0.7 Baseline + Rel att.
46.67 • In-batch evaluation • batch_size: 128(= 1-to-128 classification problem) 30/42

31/47 Conclusion • Definition-only based KGC seems to fail. •
Maybe another score-function can improve results. • Do we need “Solely-KG-based” emb. signal, too.? 31/42

32/47 • Very simple model [Shah et al., AAAI’19] gets
good result. Do we need solely-KGbased-emb. too? 32/42

33/47 [Shah et al., AAAI’19] model • 1. get text-based
entity emb. by averaging word embedding. (averaging word emb.) 33/42

34/47 • 2. they training KG-based entity emb., independent from
text. [Shah et al., AAAI’19] model 34/42

35/47 • 3. Learn Projection of text-emb. to KG-emb. [Shah
et al., AAAI’19] model text-entity emb. KG emb. : learned 35/42

36/47 • Evaluation : Open world setting. Test data: unknown
entity + its definition [Shah et al., AAAI’19] model 36/42

entity + its definition [Shah et al., AAAI’19] model word emb. averaging 37/42

entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 38/42

entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 39/42

40/47 • Result: [Shah et al., AAAI’19] model 40/42

41/47 • Unnatural filtering [Shah et al., AAAI’19] model 41/42

42/47 Conclusion • Definition-only based KGC seems to fail. •
Maybe we need “Solely-KG-based” emb. signal, too. 42/42

43/47 Supplementation 43 43/42

44/47 Yes Previous Entity rep. studies(~CoNLL’19) Use massive text? •
Entity : existing in KB(e.g. Wikipedia, Freebase, etc) Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balazevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 44/42

45/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,
relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. 45/42

relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward 46/42

relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward • Full-softmax for entire KB entities : NOT realistic 47/42

relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. ・ Under mini-dataset, we can conduct Full-softmax over all entities. ・ How about large-scale dataset? 48/42

49/47 KGC with large scale data • Only few research
exist(!) [Balkır et al., EMNLP’19] 49/42

50/47 KGC with large scale data • Only few research
exist(!) [Balkır et al., EMNLP’19] both using Sampled-softmax 50/42

51/47 • Not relying on massive unlabeled text, • by
leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Where my interest exist? 51/42

Entity representation with relational attention​

Entity representation with relational attention​

More Decks by izuna385

Other Decks in Research

Featured

Transcript

Entity representation with relational attention

Entity representation with relational attention