Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Entity representation with relational attention​

Avatar for izuna385 izuna385
November 26, 2019

Entity representation with relational attention​

Combining TuckER [Balazevic et al., '19] with label attention [Wang et al., '18].

Experiments for getting entity representations by leveraging both of KG relations and texts.

Avatar for izuna385

izuna385

November 26, 2019
Tweet

More Decks by izuna385

Other Decks in Research

Transcript

  1. 2/47 • Not relying on massive unlabeled text, • by

    leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Motivation (new) 2/42
  2. 3/47 • Massive Approach • Solving KGC Approach • Solving

    KGC with text Approach How to get entity rep.? 3/42
  3. 4/47 Yes Massive approach Use massive text? Yes No Require

    “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × 4/42
  4. 5/47 Yes Massive approach Use massive text? Yes No Require

    “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × • Needs lots of mention-entity annotations 5/42
  5. 6/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Solving KGC approach Solving KGC + text approach 6/42
  6. 7/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) 7/42
  7. 8/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 8/42
  8. 9/47 Yes Solving KGC(with text) approach Use massive text? Yes

    No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … • Closed setting • Scaling problem not considered 9/42
  9. 10/47 KGC-based entity emb. problem • Only small-KB is considered,

    under closed setting. • For example, [Nathani et al, EMNLP ’19] requires 2-hop attention. High computational cost, when apply to real KB.( ~ 1,000,000) • Under real KB, we can’t full-softmax backwards. [Balkır et al., EMNLP’19] 10/42
  10. 11/47 Yes Solving KGC with text approach Use massive text?

    Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balaˇzevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 11/42
  11. 12/47 • [Han et al., AAAI’18] It needs lot of

    sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. 12/42
  12. 13/47 • [Han et al., AAAI’18] It needs lot of

    sentence which contain two entities. Actually, they use NYTimes corpus + Distant supervision. Attention between rel. and entity def. Not closed in KB. 13/42
  13. 14/47 Intuition • Why not use information from entity strings

    and definitions itself? Definitions Relations (triplets) 14/42
  14. 15/47 Intuition • def. text + rel. --> def. text

    [Xu, et al, IJCAI ’16] from KG from KG 15/42
  15. 18/47 Baseline Model: DefinitionTuckER Encoder Encoder • Encoded definition sentence

    is used as entity embedding. Baseline: No interaction between entity and rel. 18/42
  16. 20/47 • Consider interaction under score function (too many) •

    How about interaction between entity itself and relation? Previous entity-relation interaction studies 20/42
  17. 21/47 Proposal Encoder Encoder • Encoded definition sentence is used

    as entity embedding. Proposal: Relational attention to entity definition 21/42
  18. 22/47 Label Attention to sentence • [Wang et al., ACL’18]

    sentence seq. sentence emb. sentence seq. label emb. matrix (learned) Label attended sentence emb. β: label-emb. attention. 22/42
  19. 23/47 Relation Attention to entity definition • [Wang et al.,

    ACL’18] sentence seq. sentence emb. Entity definition sentence seq. label emb. matrix (learned) entity emb. Use Relation emb. matrix here ( ) Get relation-attended entity emb. 23/42
  20. 24/47 Training • In-batch negative sampling [Henderson et al., ‘17;

    Gillick et al., ‘18] head def. tail def. rel. one batch … … … 24/42
  21. 25/47 Training • In-batch negative sampling [Henderson et al., ‘17;

    Gillick et al., ‘18] head def. tail def. rel. one batch × gold negative negative … gold gold 25/42
  22. 26/47 • Dataset : DBPedia50k • Task : Predict (entity

    def., + rel., --> ? ) in test data. • Evaluation : Standard KGC settings. Experiment 26/42
  23. 27/47 Experiment setting • Baseline : Definition encoder : ELMo

    + StackedAttLSTM. + • Proposal : Definition encoder : ELMo + StackedAttLSTM. + RelationAtt. + Attention sum Attention sum 27/42
  24. 28/47 Result@test Model Hits@1 Hits@10 Baseline 0.0 0.0 Baseline +

    Rel att. 1.2 7.6 • Evaluation : 1-to-All entities • test data : 10969 triplets NOTE: • My experiment iteration is only 20. • For example, TuckER[Balazevic et al., ‘19] requires 500 iter. ConMask [Shi and Weinger, AAAI’18] 81.0 28/42
  25. 29/47 In-batch Evaluation • In-batch : 1-to-batch size classification head

    def. tail def. rel. one batch × … score(softmaxed) 0.12 0.05 0.31 0.22 0.01 0.13 gold In this case, model failed to predict tail. 29/42
  26. 30/47 Result@test Model Hits@1 Baseline 0.7 Baseline + Rel att.

    46.67 • In-batch evaluation • batch_size: 128(= 1-to-128 classification problem) 30/42
  27. 31/47 Conclusion • Definition-only based KGC seems to fail. •

    Maybe another score-function can improve results. • Do we need “Solely-KG-based” emb. signal, too.? 31/42
  28. 32/47 • Very simple model [Shah et al., AAAI’19] gets

    good result. Do we need solely-KGbased-emb. too? 32/42
  29. 33/47 [Shah et al., AAAI’19] model • 1. get text-based

    entity emb. by averaging word embedding. (averaging word emb.) 33/42
  30. 34/47 • 2. they training KG-based entity emb., independent from

    text. [Shah et al., AAAI’19] model 34/42
  31. 35/47 • 3. Learn Projection of text-emb. to KG-emb. [Shah

    et al., AAAI’19] model text-entity emb. KG emb. : learned 35/42
  32. 36/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model 36/42
  33. 37/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging 37/42
  34. 38/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 38/42
  35. 39/47 • Evaluation : Open world setting. Test data: unknown

    entity + its definition [Shah et al., AAAI’19] model word emb. averaging project to KG space by 39/42
  36. 42/47 Conclusion • Definition-only based KGC seems to fail. •

    Maybe we need “Solely-KG-based” emb. signal, too. 42/42
  37. 44/47 Yes Previous Entity rep. studies(~CoNLL’19) Use massive text? •

    Entity : existing in KB(e.g. Wikipedia, Freebase, etc) Yes No Require “entity-span” annotations? Use relations? Yes No Use relation? JointEnt [Yamada, et al., ACL ’17] KnowBert [Peters, et al, EMNLP ’19] (Partially annotated data used) No Yes No DEER [Gillick, et al., CoNLL ’19] ERNIE [Zhang, et al., ACL ’19] BertEnt [Yamada, et al., ’19] EntEval [Chen, et al., EMNLP’19] × Relations or adding text? Relations Relations + text Multihop-GAT [Nathani, et al, EMNLP ’19] TuckER [Balazevic, et al, EMNLP ’19] , ComplEx, DistMult, … (SoTA) AATE [An et al., NAACL ‘18], MutualAtt [Han et al., AAAI18], … 44/42
  38. 45/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. 45/42
  39. 46/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward 46/42
  40. 47/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. When backpropagating full-softmax to all entity embeddings. Entity Emb. Matrix Ent. rel. softmax-score 0.02 0.14 0.21 0.01 0.13 0.05 forward backward • Full-softmax for entire KB entities : NOT realistic 47/42
  41. 48/47 Scaling problem • Node : 2,575,340, edge: 24,862,972 ,

    relation kinds : 599 • Too high Computational Cost. Can’t conduct experiment. ・ Under mini-dataset, we can conduct Full-softmax over all entities. ・ How about large-scale dataset? 48/42
  42. 49/47 KGC with large scale data • Only few research

    exist(!) [Balkır et al., EMNLP’19] 49/42
  43. 50/47 KGC with large scale data • Only few research

    exist(!) [Balkır et al., EMNLP’19] both using Sampled-softmax 50/42
  44. 51/47 • Not relying on massive unlabeled text, • by

    leveraging both KG relations and texts, • get (densified) entity rep • for Entity Linking, • coping with KB-scale problem Where my interest exist? 51/42