Expanding Taxonomies with Implicit Edge Semantics

Expanding Taxonomies with Implicit Edge Semantics

25 minute talk at The Web Conference (WWW) 2020.
Project website: http://cmuarborist.github.io

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

April 23, 2020
Tweet

Transcript

  1. 4.

    Taxonomies Geographic entities Collection of related concepts Asia S. Asia

    E. Asia India Pakistan Nepal Mumbai Delhi maps.google.com
  2. 5.

    Taxonomies Geographic entities Musical genres Collection of related concepts Rock

    Punk Alternative Grunge Rapcore Indie Instrumental Vocal musicmap.info
  3. 6.

    Taxonomies Geographic entities Musical genres Product categories Collection of related

    concepts Apparel Clothing Jewelry Active Sleep Formal Cycling Football
  4. 10.

    Taxonomies Collection of related concepts structured as a directed graph

    encoding a hierarchy PARENT (i) related to, and (ii) more general than CHILD [Distributional Inclusion Hypothesis; Geffet & Dagan, ACL 2005]
  5. 12.

    Taxonomies Help improve performance in: • Classification [Babbar et. al.,

    2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]
  6. 13.

    The Pinterest Taxonomy Help improve performance in: • Classification [Babbar

    et. al., 2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]
  7. 16.

    100% expert curated Rafael S Gonçalves, Matthew Horridge, Rui Li,

    Yu Liu, Mark A Musen, Csongor I Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. Use of OWL and Semantic Web Technologies at Pinterest. International Semantic Web Conference, 2019. [Best Paper, In-Use Track] 8 curators 1 month 6,000 nodes
  8. 17.
  9. 21.

    Problem q Rank the taxonomy nodes such that true parents

    of are ranked high q Given a taxonomy with node feature vectors and unseen query node q
  10. 22.

    Easy Human Verification Want predicted parents near true parents —

    quantified by shortest- path distance from top- ranked prediction q Problem
  11. 24.

    Challenges Lexical memorization Omer Levy, Steffen Remus, Chris Biemann, and

    Ido Dagan. Do supervised distributional methods really learn lexical inference relations?. NAACL-HLT 2015.
  12. 29.

    Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge

    semantics are heterogenous and unobserved Want to learn these semantics from the natural organization used by taxonomists to serve business needs
  13. 30.

    Challenges Need predictions for humans q True Parent Easy fix

    Hard fix Edge semantics are heterogenous and unobserved Lexical memorization Query
  14. 35.

    parent v child u Relatedness score s(u, v) s(u, v)

    = (eu M) ⋅ ev Taxonomic Relatedness eu ev Node Feature Vectors
  15. 36.

    parent v child u Relatedness score s(u, v) s(u, v)

    = (eu M) ⋅ ev Taxonomic Relatedness Learn from data eu ev Node Feature Vectors
  16. 37.

    parent v child u s(u, v) = (eu M) ⋅

    ev Assumes homogenous edge semantics Taxonomic Relatedness Relatedness score s(u, v) eu ev Node Feature Vectors
  17. 38.

    parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors
  18. 39.

    parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev Node-local linear map eu ev Node Feature Vectors
  19. 40.

    parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev parameters O(d2 |V|) eu ev Node Feature Vectors
  20. 41.

    parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors
  21. 42.

    parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev Mv = k ∑ i=1 wv [i] × Pi eu ev Node Feature Vectors
  22. 44.

    Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi Transformation matrix of node v
  23. 45.

    Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi latent edge semantics k Transformation matrix of node v
  24. 46.

    Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi Transformation matrix of node v latent edge semantics k Linear map for semantic type i
  25. 47.

    Taxonomic Relatedness Mv = k ∑ i=1 [i] × Pi

    latent edge semantics k Taxonomic “role” of parent v wv Linear map for semantic type i Transformation matrix of node v
  26. 49.

    Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev Mv = k ∑ i=1 [i] × Pi wv
  27. 50.

    Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev To learn: P1 , …, Pk f : ℝd → ℝk Mv = k ∑ i=1 [i] × Pi wv
  28. 51.

    Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev parameters Information-sharing across nodes Robust to noise O(d2k + |f |) Mv = k ∑ i=1 [i] × Pi wv
  29. 58.

    Large-Margin Loss Loss function u, v u, v′ [ ]+

    − s( ) s( ) ∑ (u,v,v′ ) + γ
  30. 61.

    Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set γ
  31. 62.

    Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set Option 3: Learn from data γ
  32. 63.

    Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set Option 3: Learn from data Our approach: Dynamic margins γ
  33. 66.

    Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) If, Proposition
  34. 67.

    Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition
  35. 68.

    Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition True parent Predicted parent
  36. 69.

    loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u))

    (u, v, v′ ) = shortest path distance (v, v′ ) Large-Margin Loss γ If, Proposition q True Parent Easy fix Hard fix Easier human verification!
  37. 72.

    Large-Margin Loss Negative sampling Loss rapidly drops to 0 —

    no “active” samples Infeasible to sample all non-parents v′
  38. 73.

    Large-Margin Loss Negative sampling Loss rapidly drops to 0 —

    no “active” samples Distance-weighted sampling [Wu et. al., 2017] Infeasible to sample all non-parents v′ egrades to similar performance as CRIM on the S￿￿E￿￿￿ with homogeneous edge semantics. eports the top-ranked predicted parents by A￿￿￿￿￿￿￿ ￿￿ for both accurately and inaccurately-predicted test e parents are emphasized in bold). The results showcase on a variety of node-types present in the P￿￿￿￿￿￿￿￿ from concrete entities such as locations (Luxor) and aracters (Thor) to abstract concepts such as depression. that even inaccurately-predicted parents conform to n of relatedness and immediate hierarchy, suggesting missing edges in the taxonomy. howcase A￿￿￿￿￿￿￿’s predictions for search queries made t that are not present in the taxonomy (Table 5, bottom). y, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- e queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ Of note is the search query what causes blackheads, t just associated with its obvious parent skin concern, he very relevant parent feelings. ation Study mance of A￿￿￿￿￿￿￿ may be attributed to two key model- : (i) learning node-speci￿c embeddings w to capture ous edge semantics, and (ii) optimizing a large-margin (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling
  39. 77.

    Datasets 3 Textual Taxonomies Pinterest SemEval Mammal No. of edges

    10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  40. 78.

    Datasets Pinterest Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  41. 79.

    Datasets Pinterest Heterogenous semantics Pinterest SemEval Mammal No. of edges

    10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  42. 80.

    Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)

    or abstract (Mental Wellbeing) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  43. 81.

    Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)

    or abstract (Mental Wellbeing) PinText embeddings used for each node [Zhuang and Liu, 2019] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  44. 82.

    Datasets SemEval Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  45. 83.

    Datasets SemEval From the SemEval 2018 hypernym discovery task Pinterest

    SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  46. 84.

    Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous

    “is-a” semantics Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  47. 85.

    Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous

    “is-a” semantics FastText embeddings used for each node [Bojanowski et. al., 2017] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  48. 86.

    Datasets Mammal Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  49. 87.

    Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 Pinterest SemEval

    Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  50. 88.

    Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge

    types: is-a, is- part-of-whole, is-part-of- substance Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  51. 89.

    Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge

    types: is-a, is- part-of-whole, is-part-of- substance FastText embeddings used for each node Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  52. 90.

    Datasets Evaluation Setup 15% of leaf nodes + outgoing edges

    held out for testing Remaining child-parent pairs used for training Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  53. 91.

    Datasets Metrics • Mean Reciprocal Rank (MRR): From 0% to

    100% (best) • Recall@15: From 0% to 100% (best) • Mean shortest-path distance (SPDist) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  54. 94.

    Hypernym Detectors Q. Can hypernym detectors be repurposed for taxonomy

    expansion? [Baroni et. al., 2012; Roller et. al., 2014; Weeds et. al., 2014; Shwartz et. al., 2016]
  55. 95.

    Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores
  56. 96.

    Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier
  57. 97.

    Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier Trained and tested on a balanced sample of node-pairs
  58. 98.

    Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  59. 99.

    Hypernym Detectors 1. Reasonably good performance overall F1 Pinterest SemEval

    Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  60. 100.

    Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings

    correlated with better performance F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  61. 101.

    Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings

    correlated with better performance 3. No single dominant hypernym detector F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  62. 102.

    Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks
  63. 103.

    Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance
  64. 104.

    Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance Explicit formulation of taxonomy expansion as ranking needed
  65. 107.

    Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Q. Does explicitly accommodating heterogenous edge semantics help?
  66. 108.

    Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Models homogenous edge semantics Skip-gram-negative-sampling- like loss function
  67. 109.

    Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method has better ranking performance when taxonomy has heterogenous edge semantics
  68. 110.

    Taxonomy Expansion SPDist Pinterest SemEval Mammal CRIM 2.4 2.7 4.1

    This Work 2.2 2.9 3.2 Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method predicts parents that are closer to the true parents for taxonomies with heterogenous edge semantics
  69. 113.

    Example Predictions Example results on Pinterest, correct parents in bold

    Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  70. 114.

    Example Predictions Concrete concept, “is-in” semantics Query Predicted Parents luxor

    africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  71. 115.

    Example Predictions Abstract concept, “is-type-of” semantics Query Predicted Parents luxor

    africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  72. 116.

    Example Predictions Example results on Pinterest, correct parents in bold

    Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  73. 117.

    Example failures on Pinterest (no correct parent in top 4

    predictions) Query Predicted Parents artificial flowers planting, dried flowers, DIY flowers, edible seeds thor adventure movie, action movie, science movie, adventure games smartwatch wearable devices, phone accessories, electronics, computer disney makeup halloween makeup, makeup, costume makeup, character makeup holocaust history, german history, american history, world war Example Failures
  74. 118.

    Test for Data Leakage Predictions for Pinterest search queries not

    present in the taxonomy Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies
  75. 119.

    Predictions for Pinterest search queries not present in the taxonomy

    Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies Test for Data Leakage
  76. 120.

    More in Paper Expanding Taxonomies with Implicit Edge Semantics WWW

    ’20, April 20–24, 2020, Taipei, Taiwan methods for 150 epochs (for P￿￿￿￿￿￿￿￿ and S￿￿E￿￿￿) or 500 epochs (for M￿￿￿￿￿) and select the trained model at the epoch with the highest validation MRR (see appendix for details). Taxonomy expansion results are reported in Table 4. Overall, A￿￿￿￿￿￿￿ and CRIM improve over the hypernym detectors on all datasets and evaluation metrics, by over 200% in some cases. This justi￿es explicitly optimizing for the taxonomy expansion ranking task, and representing taxonomic relationships with more complex functions of the node-pair feature-vectors. A￿￿￿￿￿￿￿ outperforms CRIM on all datasets and evaluation metrics. Notably, A￿￿￿￿￿￿￿ gracefully degrades to similar performance as CRIM on the S￿￿E￿￿￿ taxonomy with homogeneous edge semantics. Table 5 reports the top-ranked predicted parents by A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿ for both accurately and inaccurately-predicted test queries (true parents are emphasized in bold). The results showcase predictions on a variety of node-types present in the P￿￿￿￿￿￿￿￿ taxonomy, from concrete entities such as locations (Luxor) and ￿ctional characters (Thor) to abstract concepts such as depression. We observe that even inaccurately-predicted parents conform to some notion of relatedness and immediate hierarchy, suggesting potentially missing edges in the taxonomy. We also showcase A￿￿￿￿￿￿￿’s predictions for search queries made on Pinterest that are not present in the taxonomy (Table 5, bottom). Qualitatively, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- ral language queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ taxonomy. Of note is the search query what causes blackheads, which is not just associated with its obvious parent skin concern, but also to the very relevant parent feelings. 5.4 Ablation Study The performance of A￿￿￿￿￿￿￿ may be attributed to two key model- ing choices: (i) learning node-speci￿c embeddings w to capture heterogeneous edge semantics, and (ii) optimizing a large-margin (a) Summary of ablation study (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling Figure 2: Ablation study of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿: (a) sum- WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec Figure 3: E￿ect of the number of linear maps k (top-left), the number of negative samples m (top-right) and the train- ing data fraction (bottom-left) on the MRR of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿. Also shown (bottom-right) is the average undi- rected shortest-path distance between predicted and true test parents (SPDist) with training epoch. Ablation study Impact of hyperparameters Inferring taxonomic roles
  77. 121.
  78. 123.

    Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$
  79. 124.

    Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa
  80. 125.

    Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins Guarantees to Ease Human Verification ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa learned from the data [19]. We propose a principled dynamic margin func no tuning, learning or heuristics. We relate th shortest-path distances in the taxonomy between true parent nodes. Denote by d(·, ·) the undirec distance between two nodes in the taxonomy. W theorem, we bound the undirected shortest-path the highest-ranked predicted parent ˆ(u) = arg any true parent for every child node u: P￿￿￿￿￿￿￿￿￿￿ 1. When (u, , 0) = d( , 0), L bound on the sum of the undirected shortest-path the highest-ranked predicted parents and true par ’ (u, )2E d( , ˆ(u))  L(T).