Entity representation with relational attention

Slide 1

Slide 1 text

1/47 1 Entity representation with relational attention izuna385 2019-11-26 1/42

Slide 2

Slide 2 text

2/47 • Not relying on massive unlabeled text, • by leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Motivation (new) 2/42

Slide 3

Slide 3 text

3/47 • Massive Approach • Solving KGC Approach • Solving KGC with text Approach How to get entity rep.? 3/42

Slide 4

Slide 4 text

4/47 Yes Massive approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × 4/42

Slide 5

Slide 5 text

5/47 Yes Massive approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × • Needs lots of mention-entity annotations 5/42

Slide 6

Slide 6 text

6/47 Yes Solving KGC(with text) approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Solving KGC approach Solving KGC + text approach 6/42

Slide 7

Slide 7 text

7/47 Yes Solving KGC(with text) approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) 7/42

Slide 8

Slide 8 text

8/47 Yes Solving KGC(with text) approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 8/42

Slide 9

Slide 9 text

9/47 Yes Solving KGC(with text) approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … • Closed setting • Scaling problem not considered 9/42

Slide 10

Slide 10 text

10/47 KGC-based entity emb. problem • Only small-KB is considered, under closed setting. • For example, [Nathani et al, EMNLP ’19] requires 2-hop attention. High computational cost, when apply to real KB.( ~ 1,000,000) • Under real KB, we can’t full-softmax backwards. [Balkır et al., EMNLP’19] 10/42

Slide 11

Slide 11 text

11/47 Yes Solving KGC with text approach Use massive text? Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 11/42

Slide 12

Slide 12 text

12/47 • [Han et al., AAAI’18] It needs lot of sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. 12/42

Slide 13

Slide 13 text

13/47 • [Han et al., AAAI’18] It needs lot of sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. Not closed in KB. 13/42

Slide 14

Slide 14 text

14/47 Intuition • Why not use information from entity strings and definitions itself? Definitions Relations (triplets) 14/42

Slide 15

Slide 15 text

15/47 Intuition • def. text + rel. --> def. text [Xu, et al, IJCAI ’16] from KG from KG 15/42

Slide 16

Slide 16 text

16/47 Scoring Function • TuckER [Balazevic, et al, EMNLP ’19]. 16/42

Slide 17

Slide 17 text

17/47 Baseline Model: DefinitionTuckER • Encoded definition sentence is used as entity embedding. linear linear 17/42

Slide 18

Slide 18 text

18/47 Baseline Model: DefinitionTuckER Encoder Encoder • Encoded definition sentence is used as entity embedding. Baseline: No interaction between entity and rel. 18/42

Slide 19

Slide 19 text

19/47 • Consider interaction under score function (too many) Previous entity-relation interaction studies 19/42

Slide 20

Slide 20 text

20/47 • Consider interaction under score function (too many) • How about interaction between entity itself and relation? Previous entity-relation interaction studies 20/42

Slide 21

Slide 21 text

21/47 Proposal Encoder Encoder • Encoded definition sentence is used as entity embedding. Proposal: Relational attention to entity definition 21/42

Slide 22

Slide 22 text

22/47 Label Attention to sentence • [Wang et al., ACL’18] sentence seq. sentence emb. sentence seq. label emb. matrix (learned) Label attended sentence emb. β: label-emb. attention. 22/42

Slide 23

Slide 23 text

23/47 Relation Attention to entity definition • [Wang et al., ACL’18] sentence seq. sentence emb. Entity definition sentence seq. label emb. matrix (learned) entity emb. Use Relation emb. matrix here ( ) Get relation-attended entity emb. 23/42

Slide 24

Slide 24 text

24/47 Training • In-batch negative sampling [Henderson et al., ‘17; Gillick et al., ‘18] head def. tail def. rel. one batch … … … 24/42

Slide 25

Slide 25 text

25/47 Training • In-batch negative sampling [Henderson et al., ‘17; Gillick et al., ‘18] head def. tail def. rel. one batch × gold negative negative … gold gold 25/42

Slide 26

Slide 26 text

26/47 • Dataset : DBPedia50k • Task : Predict (entity def., + rel., --> ? ) in test data. • Evaluation : Standard KGC settings. Experiment 26/42

Slide 27

Slide 27 text

27/47 Experiment setting • Baseline : Definition encoder : ELMo + StackedAttLSTM. + • Proposal : Definition encoder : ELMo + StackedAttLSTM. + RelationAtt. + Attention sum Attention sum 27/42

Slide 28

Slide 28 text

28/47 Result@test Model Hits@1 Hits@10 Baseline 0.0 0.0 Baseline + Rel att. 1.2 7.6 • Evaluation : 1-to-All entities • test data : 10969 triplets NOTE: • My experiment iteration is only 20. • For example, TuckER[Balazevic et al., ‘19] requires 500 iter. ConMask [Shi and Weinger, AAAI’18] 81.0 28/42

Slide 29

Slide 29 text

29/47 In-batch Evaluation • In-batch : 1-to-batch size classification head def. tail def. rel. one batch × … score(softmaxed) 0.12 0.05 0.31 0.22 0.01 0.13 gold In this case, model failed to predict tail. 29/42

Slide 30

Slide 30 text

30/47 Result@test Model Hits@1 Baseline 0.7 Baseline + Rel att. 46.67 • In-batch evaluation • batch_size: 128(= 1-to-128 classification problem) 30/42

Slide 31

Slide 31 text

31/47 Conclusion • Definition-only based KGC seems to fail. • Maybe another score-function can improve results. • Do we need “Solely-KG-based” emb. signal, too.? 31/42

Slide 32

Slide 32 text

32/47 • Very simple model [Shah et al., AAAI’19] gets good result. Do we need solely-KGbased-emb. too? 32/42

Slide 33

Slide 33 text

33/47 [Shah et al., AAAI’19] model • 1. get text-based entity emb. by averaging word embedding. (averaging word emb.) 33/42

Slide 34

Slide 34 text

34/47 • 2. they training KG-based entity emb., independent from text. [Shah et al., AAAI’19] model 34/42

Slide 35

Slide 35 text

35/47 • 3. Learn Projection of text-emb. to KG-emb. [Shah et al., AAAI’19] model text-entity emb. KG emb. : learned 35/42

Slide 36

Slide 36 text

36/47 • Evaluation : Open world setting. Test data: unknown entity + its definition [Shah et al., AAAI’19] model 36/42

Slide 37

Slide 37 text

37/47 • Evaluation : Open world setting. Test data: unknown entity + its definition [Shah et al., AAAI’19] model word emb. averaging 37/42

Slide 38

Slide 38 text

38/47 • Evaluation : Open world setting. Test data: unknown entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 38/42

Slide 39

Slide 39 text

39/47 • Evaluation : Open world setting. Test data: unknown entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 39/42

Slide 40

Slide 40 text

40/47 • Result: [Shah et al., AAAI’19] model 40/42

Slide 41

Slide 41 text

41/47 • Unnatural filtering [Shah et al., AAAI’19] model 41/42

Slide 42

Slide 42 text

42/47 Conclusion • Definition-only based KGC seems to fail. • Maybe we need “Solely-KG-based” emb. signal, too. 42/42

Slide 43

Slide 43 text

43/47 Supplementation 43 43/42

Slide 44

Slide 44 text

44/47 Yes Previous Entity rep. studies(~CoNLL’19) Use massive text? • Entity : existing in KB(e.g. Wikipedia, Freebase, etc) Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balazevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 44/42

Slide 45

Slide 45 text

45/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 , relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. 45/42

Slide 46

Slide 46 text

46/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 , relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward 46/42

Slide 47

Slide 47 text

47/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 , relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward • Full-softmax for entire KB entities : NOT realistic 47/42

Slide 48

Slide 48 text

48/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 , relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. ・ Under mini-dataset, we can conduct Full-softmax over all entities. ・ How about large-scale dataset? 48/42

Slide 49

Slide 49 text

49/47 KGC with large scale data • Only few research exist(!) [Balkır et al., EMNLP’19] 49/42

Slide 50

Slide 50 text

50/47 KGC with large scale data • Only few research exist(!) [Balkır et al., EMNLP’19] both using Sampled-softmax 50/42

Slide 51

Slide 51 text

51/47 • Not relying on massive unlabeled text, • by leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Where my interest exist? 51/42