$30 off During Our Annual Pro Sale. View Details »

Entity representation with relational attention​

izuna385
November 26, 2019

Entity representation with relational attention​

Combining TuckER [Balazevic et al., '19] with label attention [Wang et al., '18].

Experiments for getting entity representations by leveraging both of KG relations and texts.

izuna385

November 26, 2019
Tweet

More Decks by izuna385

Other Decks in Research

Transcript

  1. 2/47 • Not relying on massive unlabeled text, • by

    leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Motivation (new) 2/42
  2. 3/47 • Massive Approach • Solving KGC Approach • Solving

    KGC with text Approach How to get entity rep.? 3/42
  3. 4/47 Yes Massive approach Use massive text? Yes No Require

    “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × 4/42
  4. 5/47 Yes Massive approach Use massive text? Yes No Require

    “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × • Needs lots of mention-entity annotations 5/42
  5. 6/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Solving KGC approach Solving KGC + text approach 6/42
  6. 7/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) 7/42
  7. 8/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 8/42
  8. 9/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … • Closed setting • Scaling problem not considered 9/42
  9. 10/47 KGC-based entity emb. problem • Only small-KB is considered,

    under closed setting. • For example, [Nathani et al, EMNLP ’19] requires 2-hop attention. High computational cost, when apply to real KB.( ~ 1,000,000) • Under real KB, we can’t full-softmax backwards. [Balkır et al., EMNLP’19] 10/42
  10. 11/47 Yes Solving KGC with text approach Use massive text?

    Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 11/42
  11. 12/47 • [Han et al., AAAI’18] It needs lot of

    sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. 12/42
  12. 13/47 • [Han et al., AAAI’18] It needs lot of

    sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. Not closed in KB. 13/42
  13. 14/47 Intuition • Why not use information from entity strings

    and definitions itself? Definitions Relations (triplets) 14/42
  14. 15/47 Intuition • def. text + rel. --> def. text

    [Xu, et al, IJCAI ’16] from KG from KG 15/42
  15. 18/47 Baseline Model: DefinitionTuckER Encoder Encoder • Encoded definition sentence

    is used as entity embedding. Baseline: No interaction between entity and rel. 18/42
  16. 20/47 • Consider interaction under score function (too many) •

    How about interaction between entity itself and relation? Previous entity-relation interaction studies 20/42
  17. 21/47 Proposal Encoder Encoder • Encoded definition sentence is used

    as entity embedding. Proposal: Relational attention to entity definition 21/42
  18. 22/47 Label Attention to sentence • [Wang et al., ACL’18]

    sentence seq. sentence emb. sentence seq. label emb. matrix (learned) Label attended sentence emb. β: label-emb. attention. 22/42
  19. 23/47 Relation Attention to entity definition • [Wang et al.,

    ACL’18] sentence seq. sentence emb. Entity definition sentence seq. label emb. matrix (learned) entity emb. Use Relation emb. matrix here ( ) Get relation-attended entity emb. 23/42
  20. 24/47 Training • In-batch negative sampling [Henderson et al., ‘17;

    Gillick et al., ‘18] head def. tail def. rel. one batch … … … 24/42
  21. 25/47 Training • In-batch negative sampling [Henderson et al., ‘17;

    Gillick et al., ‘18] head def. tail def. rel. one batch × gold negative negative … gold gold 25/42
  22. 26/47 • Dataset : DBPedia50k • Task : Predict (entity

    def., + rel., --> ? ) in test data. • Evaluation : Standard KGC settings. Experiment 26/42
  23. 27/47 Experiment setting • Baseline : Definition encoder : ELMo

    + StackedAttLSTM. + • Proposal : Definition encoder : ELMo + StackedAttLSTM. + RelationAtt. + Attention sum Attention sum 27/42
  24. 28/47 Result@test Model Hits@1 Hits@10 Baseline 0.0 0.0 Baseline +

    Rel att. 1.2 7.6 • Evaluation : 1-to-All entities • test data : 10969 triplets NOTE: • My experiment iteration is only 20. • For example, TuckER[Balazevic et al., ‘19] requires 500 iter. ConMask [Shi and Weinger, AAAI’18] 81.0 28/42
  25. 29/47 In-batch Evaluation • In-batch : 1-to-batch size classification head

    def. tail def. rel. one batch × … score(softmaxed) 0.12 0.05 0.31 0.22 0.01 0.13 gold In this case, model failed to predict tail. 29/42
  26. 30/47 Result@test Model Hits@1 Baseline 0.7 Baseline + Rel att.

    46.67 • In-batch evaluation • batch_size: 128(= 1-to-128 classification problem) 30/42
  27. 31/47 Conclusion • Definition-only based KGC seems to fail. •

    Maybe another score-function can improve results. • Do we need “Solely-KG-based” emb. signal, too.? 31/42
  28. 32/47 • Very simple model [Shah et al., AAAI’19] gets

    good result. Do we need solely-KGbased-emb. too? 32/42
  29. 33/47 [Shah et al., AAAI’19] model • 1. get text-based

    entity emb. by averaging word embedding. (averaging word emb.) 33/42
  30. 34/47 • 2. they training KG-based entity emb., independent from

    text. [Shah et al., AAAI’19] model 34/42
  31. 35/47 • 3. Learn Projection of text-emb. to KG-emb. [Shah

    et al., AAAI’19] model text-entity emb. KG emb. : learned 35/42
  32. 36/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model 36/42
  33. 37/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging 37/42
  34. 38/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 38/42
  35. 39/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 39/42
  36. 42/47 Conclusion • Definition-only based KGC seems to fail. •

    Maybe we need “Solely-KG-based” emb. signal, too. 42/42
  37. 44/47 Yes Previous Entity rep. studies(~CoNLL’19) Use massive text? •

    Entity : existing in KB(e.g. Wikipedia, Freebase, etc) Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balazevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 44/42
  38. 45/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. 45/42
  39. 46/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward 46/42
  40. 47/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward • Full-softmax for entire KB entities : NOT realistic 47/42
  41. 48/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. ・ Under mini-dataset, we can conduct Full-softmax over all entities. ・ How about large-scale dataset? 48/42
  42. 49/47 KGC with large scale data • Only few research

    exist(!) [Balkır et al., EMNLP’19] 49/42
  43. 50/47 KGC with large scale data • Only few research

    exist(!) [Balkır et al., EMNLP’19] both using Sampled-softmax 50/42
  44. 51/47 • Not relying on massive unlabeled text, • by

    leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Where my interest exist? 51/42