Pro Yearly is on sale from $80 to $50! »

Expanding Taxonomies with Implicit Edge Semantics

Expanding Taxonomies with Implicit Edge Semantics

25 minute talk at The Web Conference (WWW) 2020.
Project website: http://cmuarborist.github.io

Ed09e933a899fcae158439f11f66fed0?s=128

Emaad Manzoor

April 23, 2020
Tweet

Transcript

  1. Expanding Taxonomies with Implicit Edge Semantics Emaad Manzoor Dhananjay Shrouty

    Rui Li Jure Leskovec
  2. Expanding Taxonomies with Implicit Edge Semantics Emaad Manzoor Dhananjay Shrouty

    Rui Li Jure Leskovec
  3. Taxonomies Collection of related concepts

  4. Taxonomies Geographic entities Collection of related concepts Asia S. Asia

    E. Asia India Pakistan Nepal Mumbai Delhi maps.google.com
  5. Taxonomies Geographic entities Musical genres Collection of related concepts Rock

    Punk Alternative Grunge Rapcore Indie Instrumental Vocal musicmap.info
  6. Taxonomies Geographic entities Musical genres Product categories Collection of related

    concepts Apparel Clothing Jewelry Active Sleep Formal Cycling Football
  7. Taxonomies Collection of related concepts structured as a directed graph

  8. Taxonomies Collection of related concepts structured as a directed graph

    PARENT CHILD
  9. Taxonomies Collection of related concepts structured as a directed graph

    encoding a hierarchy PARENT CHILD
  10. Taxonomies Collection of related concepts structured as a directed graph

    encoding a hierarchy PARENT (i) related to, and (ii) more general than CHILD [Distributional Inclusion Hypothesis; Geffet & Dagan, ACL 2005]
  11. Taxonomies Additional Assumption Each concept has a non- taxonomic feature

    vector Word embeddings Image embeddings
  12. Taxonomies Help improve performance in: • Classification [Babbar et. al.,

    2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]
  13. The Pinterest Taxonomy Help improve performance in: • Classification [Babbar

    et. al., 2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]
  14. The Pinterest Taxonomy Hierarchy of interests ~11,000 nodes/edges 7 levels

    deep 100% expert curated
  15. 100% expert curated

  16. 100% expert curated Rafael S Gonçalves, Matthew Horridge, Rui Li,

    Yu Liu, Mark A Musen, Csongor I Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. Use of OWL and Semantic Web Technologies at Pinterest. International Semantic Web Conference, 2019. [Best Paper, In-Use Track] 8 curators 1 month 6,000 nodes
  17. Problem

  18. Problem Given a taxonomy

  19. Problem Given a taxonomy with node feature vectors

  20. Problem Given a taxonomy with node feature vectors and unseen

    query node q q
  21. Problem q Rank the taxonomy nodes such that true parents

    of are ranked high q Given a taxonomy with node feature vectors and unseen query node q
  22. Easy Human Verification Want predicted parents near true parents —

    quantified by shortest- path distance from top- ranked prediction q Problem
  23. Challenges

  24. Challenges Lexical memorization Omer Levy, Steffen Remus, Chris Biemann, and

    Ido Dagan. Do supervised distributional methods really learn lexical inference relations?. NAACL-HLT 2015.
  25. Challenges Lexical memorization Edge semantics are heterogenous

  26. Challenges Lexical memorization Edge semantics are heterogenous Paris France Is-in

  27. Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge

    semantics are heterogenous
  28. Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge

    semantics are heterogenous and unobserved
  29. Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge

    semantics are heterogenous and unobserved Want to learn these semantics from the natural organization used by taxonomists to serve business needs
  30. Challenges Need predictions for humans q True Parent Easy fix

    Hard fix Edge semantics are heterogenous and unobserved Lexical memorization Query
  31. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  32. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  33. parent v child u eu ev Taxonomic Relatedness Node Feature

    Vectors
  34. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    eu ev Node Feature Vectors
  35. parent v child u Relatedness score s(u, v) s(u, v)

    = (eu M) ⋅ ev Taxonomic Relatedness eu ev Node Feature Vectors
  36. parent v child u Relatedness score s(u, v) s(u, v)

    = (eu M) ⋅ ev Taxonomic Relatedness Learn from data eu ev Node Feature Vectors
  37. parent v child u s(u, v) = (eu M) ⋅

    ev Assumes homogenous edge semantics Taxonomic Relatedness Relatedness score s(u, v) eu ev Node Feature Vectors
  38. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors
  39. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev Node-local linear map eu ev Node Feature Vectors
  40. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev parameters O(d2 |V|) eu ev Node Feature Vectors
  41. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors
  42. parent v child u Taxonomic Relatedness Relatedness score s(u, v)

    s(u, v) = (eu Mv ) ⋅ ev Mv = k ∑ i=1 wv [i] × Pi eu ev Node Feature Vectors
  43. Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi
  44. Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi Transformation matrix of node v
  45. Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi latent edge semantics k Transformation matrix of node v
  46. Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×

    Pi Transformation matrix of node v latent edge semantics k Linear map for semantic type i
  47. Taxonomic Relatedness Mv = k ∑ i=1 [i] × Pi

    latent edge semantics k Taxonomic “role” of parent v wv Linear map for semantic type i Transformation matrix of node v
  48. Taxonomic Relatedness = f(ev ) Taxonomic “role” of parent v

    is any learnable function f wv
  49. Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev Mv = k ∑ i=1 [i] × Pi wv
  50. Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev To learn: P1 , …, Pk f : ℝd → ℝk Mv = k ∑ i=1 [i] × Pi wv
  51. Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu

    Mv ) ⋅ ev parameters Information-sharing across nodes Robust to noise O(d2k + |f |) Mv = k ∑ i=1 [i] × Pi wv
  52. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  53. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  54. Large-Margin Loss

  55. Large-Margin Loss Desired constraint s(child, parent) > s(child,nonparent) + γ

  56. Large-Margin Loss s(child, parent) _ Violated constraint > s(child,nonparent) +

    γ
  57. Large-Margin Loss s(child, parent) − Constraint violation [ ]+ s(child,nonparent)

    + γ
  58. Large-Margin Loss Loss function u, v u, v′ [ ]+

    − s( ) s( ) ∑ (u,v,v′ ) + γ
  59. Large-Margin Loss How to pick the margin? γ

  60. Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant γ
  61. Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set γ
  62. Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set Option 3: Learn from data γ
  63. Large-Margin Loss How to pick the margin? Option 1: Heuristic

    constant Option 2: Tune on validation set Option 3: Learn from data Our approach: Dynamic margins γ
  64. Large-Margin Loss γ (u, v, v′ )

  65. Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ )
  66. Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) If, Proposition
  67. Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition
  68. Large-Margin Loss γ (u, v, v′ ) = shortest path

    distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition True parent Predicted parent
  69. loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u))

    (u, v, v′ ) = shortest path distance (v, v′ ) Large-Margin Loss γ If, Proposition q True Parent Easy fix Hard fix Easier human verification!
  70. Large-Margin Loss Infeasible to sample all non-parents v′

  71. Large-Margin Loss Negative sampling Infeasible to sample all non-parents v′

  72. Large-Margin Loss Negative sampling Loss rapidly drops to 0 —

    no “active” samples Infeasible to sample all non-parents v′
  73. Large-Margin Loss Negative sampling Loss rapidly drops to 0 —

    no “active” samples Distance-weighted sampling [Wu et. al., 2017] Infeasible to sample all non-parents v′ egrades to similar performance as CRIM on the S￿￿E￿￿￿ with homogeneous edge semantics. eports the top-ranked predicted parents by A￿￿￿￿￿￿￿ ￿￿ for both accurately and inaccurately-predicted test e parents are emphasized in bold). The results showcase on a variety of node-types present in the P￿￿￿￿￿￿￿￿ from concrete entities such as locations (Luxor) and aracters (Thor) to abstract concepts such as depression. that even inaccurately-predicted parents conform to n of relatedness and immediate hierarchy, suggesting missing edges in the taxonomy. howcase A￿￿￿￿￿￿￿’s predictions for search queries made t that are not present in the taxonomy (Table 5, bottom). y, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- e queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ Of note is the search query what causes blackheads, t just associated with its obvious parent skin concern, he very relevant parent feelings. ation Study mance of A￿￿￿￿￿￿￿ may be attributed to two key model- : (i) learning node-speci￿c embeddings w to capture ous edge semantics, and (ii) optimizing a large-margin (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling
  74. Implementation Details CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io

  75. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  76. Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic

    Margins 3. Evaluation
  77. Datasets 3 Textual Taxonomies Pinterest SemEval Mammal No. of edges

    10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  78. Datasets Pinterest Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  79. Datasets Pinterest Heterogenous semantics Pinterest SemEval Mammal No. of edges

    10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  80. Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)

    or abstract (Mental Wellbeing) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  81. Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)

    or abstract (Mental Wellbeing) PinText embeddings used for each node [Zhuang and Liu, 2019] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  82. Datasets SemEval Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  83. Datasets SemEval From the SemEval 2018 hypernym discovery task Pinterest

    SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  84. Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous

    “is-a” semantics Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  85. Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous

    “is-a” semantics FastText embeddings used for each node [Bojanowski et. al., 2017] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  86. Datasets Mammal Pinterest SemEval Mammal No. of edges 10768 18827

    5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  87. Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 Pinterest SemEval

    Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  88. Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge

    types: is-a, is- part-of-whole, is-part-of- substance Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  89. Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge

    types: is-a, is- part-of-whole, is-part-of- substance FastText embeddings used for each node Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  90. Datasets Evaluation Setup 15% of leaf nodes + outgoing edges

    held out for testing Remaining child-parent pairs used for training Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  91. Datasets Metrics • Mean Reciprocal Rank (MRR): From 0% to

    100% (best) • Recall@15: From 0% to 100% (best) • Mean shortest-path distance (SPDist) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
  92. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  93. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  94. Hypernym Detectors Q. Can hypernym detectors be repurposed for taxonomy

    expansion? [Baroni et. al., 2012; Roller et. al., 2014; Weeds et. al., 2014; Shwartz et. al., 2016]
  95. Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores
  96. Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier
  97. Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier Trained and tested on a balanced sample of node-pairs
  98. Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%

    SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  99. Hypernym Detectors 1. Reasonably good performance overall F1 Pinterest SemEval

    Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  100. Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings

    correlated with better performance F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  101. Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings

    correlated with better performance 3. No single dominant hypernym detector F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%
  102. Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks
  103. Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance
  104. Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%

    SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance Explicit formulation of taxonomy expansion as ranking needed
  105. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  106. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  107. Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Q. Does explicitly accommodating heterogenous edge semantics help?
  108. Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Models homogenous edge semantics Skip-gram-negative-sampling- like loss function
  109. Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%

    This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method has better ranking performance when taxonomy has heterogenous edge semantics
  110. Taxonomy Expansion SPDist Pinterest SemEval Mammal CRIM 2.4 2.7 4.1

    This Work 2.2 2.9 3.2 Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method predicts parents that are closer to the true parents for taxonomies with heterogenous edge semantics
  111. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  112. Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.

    Example Predictions on Pinterest
  113. Example Predictions Example results on Pinterest, correct parents in bold

    Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  114. Example Predictions Concrete concept, “is-in” semantics Query Predicted Parents luxor

    africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  115. Example Predictions Abstract concept, “is-type-of” semantics Query Predicted Parents luxor

    africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  116. Example Predictions Example results on Pinterest, correct parents in bold

    Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny
  117. Example failures on Pinterest (no correct parent in top 4

    predictions) Query Predicted Parents artificial flowers planting, dried flowers, DIY flowers, edible seeds thor adventure movie, action movie, science movie, adventure games smartwatch wearable devices, phone accessories, electronics, computer disney makeup halloween makeup, makeup, costume makeup, character makeup holocaust history, german history, american history, world war Example Failures
  118. Test for Data Leakage Predictions for Pinterest search queries not

    present in the taxonomy Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies
  119. Predictions for Pinterest search queries not present in the taxonomy

    Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies Test for Data Leakage
  120. More in Paper Expanding Taxonomies with Implicit Edge Semantics WWW

    ’20, April 20–24, 2020, Taipei, Taiwan methods for 150 epochs (for P￿￿￿￿￿￿￿￿ and S￿￿E￿￿￿) or 500 epochs (for M￿￿￿￿￿) and select the trained model at the epoch with the highest validation MRR (see appendix for details). Taxonomy expansion results are reported in Table 4. Overall, A￿￿￿￿￿￿￿ and CRIM improve over the hypernym detectors on all datasets and evaluation metrics, by over 200% in some cases. This justi￿es explicitly optimizing for the taxonomy expansion ranking task, and representing taxonomic relationships with more complex functions of the node-pair feature-vectors. A￿￿￿￿￿￿￿ outperforms CRIM on all datasets and evaluation metrics. Notably, A￿￿￿￿￿￿￿ gracefully degrades to similar performance as CRIM on the S￿￿E￿￿￿ taxonomy with homogeneous edge semantics. Table 5 reports the top-ranked predicted parents by A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿ for both accurately and inaccurately-predicted test queries (true parents are emphasized in bold). The results showcase predictions on a variety of node-types present in the P￿￿￿￿￿￿￿￿ taxonomy, from concrete entities such as locations (Luxor) and ￿ctional characters (Thor) to abstract concepts such as depression. We observe that even inaccurately-predicted parents conform to some notion of relatedness and immediate hierarchy, suggesting potentially missing edges in the taxonomy. We also showcase A￿￿￿￿￿￿￿’s predictions for search queries made on Pinterest that are not present in the taxonomy (Table 5, bottom). Qualitatively, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- ral language queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ taxonomy. Of note is the search query what causes blackheads, which is not just associated with its obvious parent skin concern, but also to the very relevant parent feelings. 5.4 Ablation Study The performance of A￿￿￿￿￿￿￿ may be attributed to two key model- ing choices: (i) learning node-speci￿c embeddings w to capture heterogeneous edge semantics, and (ii) optimizing a large-margin (a) Summary of ablation study (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling Figure 2: Ablation study of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿: (a) sum- WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec Figure 3: E￿ect of the number of linear maps k (top-left), the number of negative samples m (top-right) and the train- ing data fraction (bottom-left) on the MRR of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿. Also shown (bottom-right) is the average undi- rected shortest-path distance between predicted and true test parents (SPDist) with training epoch. Ablation study Impact of hyperparameters Inferring taxonomic roles
  121. Summary

  122. Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification
  123. Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$
  124. Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa
  125. Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop

    verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins Guarantees to Ease Human Verification ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa learned from the data [19]. We propose a principled dynamic margin func no tuning, learning or heuristics. We relate th shortest-path distances in the taxonomy between true parent nodes. Denote by d(·, ·) the undirec distance between two nodes in the taxonomy. W theorem, we bound the undirected shortest-path the highest-ranked predicted parent ˆ(u) = arg any true parent for every child node u: P￿￿￿￿￿￿￿￿￿￿ 1. When (u, , 0) = d( , 0), L bound on the sum of the undirected shortest-path the highest-ranked predicted parents and true par ’ (u, )2E d( , ˆ(u))  L(T).
  126. 126 CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io emaad@cmu.edu