Slide 1

Slide 1 text

Expanding Taxonomies with Implicit Edge Semantics Emaad Manzoor Dhananjay Shrouty Rui Li Jure Leskovec

Slide 2

Slide 2 text

Expanding Taxonomies with Implicit Edge Semantics Emaad Manzoor Dhananjay Shrouty Rui Li Jure Leskovec

Slide 3

Slide 3 text

Taxonomies Collection of related concepts

Slide 4

Slide 4 text

Taxonomies Geographic entities Collection of related concepts Asia S. Asia E. Asia India Pakistan Nepal Mumbai Delhi maps.google.com

Slide 5

Slide 5 text

Taxonomies Geographic entities Musical genres Collection of related concepts Rock Punk Alternative Grunge Rapcore Indie Instrumental Vocal musicmap.info

Slide 6

Slide 6 text

Taxonomies Geographic entities Musical genres Product categories Collection of related concepts Apparel Clothing Jewelry Active Sleep Formal Cycling Football

Slide 7

Slide 7 text

Taxonomies Collection of related concepts structured as a directed graph

Slide 8

Slide 8 text

Taxonomies Collection of related concepts structured as a directed graph PARENT CHILD

Slide 9

Slide 9 text

Taxonomies Collection of related concepts structured as a directed graph encoding a hierarchy PARENT CHILD

Slide 10

Slide 10 text

Taxonomies Collection of related concepts structured as a directed graph encoding a hierarchy PARENT (i) related to, and (ii) more general than CHILD [Distributional Inclusion Hypothesis; Geffet & Dagan, ACL 2005]

Slide 11

Slide 11 text

Taxonomies Additional Assumption Each concept has a non- taxonomic feature vector Word embeddings Image embeddings

Slide 12

Slide 12 text

Taxonomies Help improve performance in: • Classification [Babbar et. al., 2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]

Slide 13

Slide 13 text

The Pinterest Taxonomy Help improve performance in: • Classification [Babbar et. al., 2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]

Slide 14

Slide 14 text

The Pinterest Taxonomy Hierarchy of interests ~11,000 nodes/edges 7 levels deep 100% expert curated

Slide 15

Slide 15 text

100% expert curated

Slide 16

Slide 16 text

100% expert curated Rafael S Gonçalves, Matthew Horridge, Rui Li, Yu Liu, Mark A Musen, Csongor I Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. Use of OWL and Semantic Web Technologies at Pinterest. International Semantic Web Conference, 2019. [Best Paper, In-Use Track] 8 curators 1 month 6,000 nodes

Slide 17

Slide 17 text

Problem

Slide 18

Slide 18 text

Problem Given a taxonomy

Slide 19

Slide 19 text

Problem Given a taxonomy with node feature vectors

Slide 20

Slide 20 text

Problem Given a taxonomy with node feature vectors and unseen query node q q

Slide 21

Slide 21 text

Problem q Rank the taxonomy nodes such that true parents of are ranked high q Given a taxonomy with node feature vectors and unseen query node q

Slide 22

Slide 22 text

Easy Human Verification Want predicted parents near true parents — quantified by shortest- path distance from top- ranked prediction q Problem

Slide 23

Slide 23 text

Challenges

Slide 24

Slide 24 text

Challenges Lexical memorization Omer Levy, Steffen Remus, Chris Biemann, and Ido Dagan. Do supervised distributional methods really learn lexical inference relations?. NAACL-HLT 2015.

Slide 25

Slide 25 text

Challenges Lexical memorization Edge semantics are heterogenous

Slide 26

Slide 26 text

Challenges Lexical memorization Edge semantics are heterogenous Paris France Is-in

Slide 27

Slide 27 text

Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge semantics are heterogenous

Slide 28

Slide 28 text

Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge semantics are heterogenous and unobserved

Slide 29

Slide 29 text

Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge semantics are heterogenous and unobserved Want to learn these semantics from the natural organization used by taxonomists to serve business needs

Slide 30

Slide 30 text

Challenges Need predictions for humans q True Parent Easy fix Hard fix Edge semantics are heterogenous and unobserved Lexical memorization Query

Slide 31

Slide 31 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 32

Slide 32 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 33

Slide 33 text

parent v child u eu ev Taxonomic Relatedness Node Feature Vectors

Slide 34

Slide 34 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) eu ev Node Feature Vectors

Slide 35

Slide 35 text

parent v child u Relatedness score s(u, v) s(u, v) = (eu M) ⋅ ev Taxonomic Relatedness eu ev Node Feature Vectors

Slide 36

Slide 36 text

parent v child u Relatedness score s(u, v) s(u, v) = (eu M) ⋅ ev Taxonomic Relatedness Learn from data eu ev Node Feature Vectors

Slide 37

Slide 37 text

parent v child u s(u, v) = (eu M) ⋅ ev Assumes homogenous edge semantics Taxonomic Relatedness Relatedness score s(u, v) eu ev Node Feature Vectors

Slide 38

Slide 38 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors

Slide 39

Slide 39 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev Node-local linear map eu ev Node Feature Vectors

Slide 40

Slide 40 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev parameters O(d2 |V|) eu ev Node Feature Vectors

Slide 41

Slide 41 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors

Slide 42

Slide 42 text

parent v child u Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev Mv = k ∑ i=1 wv [i] × Pi eu ev Node Feature Vectors

Slide 43

Slide 43 text

Taxonomic Relatedness Mv = k ∑ i=1 wv [i] × Pi

Slide 44

Slide 44 text

Taxonomic Relatedness Mv = k ∑ i=1 wv [i] × Pi Transformation matrix of node v

Slide 45

Slide 45 text

Taxonomic Relatedness Mv = k ∑ i=1 wv [i] × Pi latent edge semantics k Transformation matrix of node v

Slide 46

Slide 46 text

Taxonomic Relatedness Mv = k ∑ i=1 wv [i] × Pi Transformation matrix of node v latent edge semantics k Linear map for semantic type i

Slide 47

Slide 47 text

Taxonomic Relatedness Mv = k ∑ i=1 [i] × Pi latent edge semantics k Taxonomic “role” of parent v wv Linear map for semantic type i Transformation matrix of node v

Slide 48

Slide 48 text

Taxonomic Relatedness = f(ev ) Taxonomic “role” of parent v is any learnable function f wv

Slide 49

Slide 49 text

Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev Mv = k ∑ i=1 [i] × Pi wv

Slide 50

Slide 50 text

Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev To learn: P1 , …, Pk f : ℝd → ℝk Mv = k ∑ i=1 [i] × Pi wv

Slide 51

Slide 51 text

Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu Mv ) ⋅ ev parameters Information-sharing across nodes Robust to noise O(d2k + |f |) Mv = k ∑ i=1 [i] × Pi wv

Slide 52

Slide 52 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 53

Slide 53 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 54

Slide 54 text

Large-Margin Loss

Slide 55

Slide 55 text

Large-Margin Loss Desired constraint s(child, parent) > s(child,nonparent) + γ

Slide 56

Slide 56 text

Large-Margin Loss s(child, parent) _ Violated constraint > s(child,nonparent) + γ

Slide 57

Slide 57 text

Large-Margin Loss s(child, parent) − Constraint violation [ ]+ s(child,nonparent) + γ

Slide 58

Slide 58 text

Large-Margin Loss Loss function u, v u, v′ [ ]+ − s( ) s( ) ∑ (u,v,v′ ) + γ

Slide 59

Slide 59 text

Large-Margin Loss How to pick the margin? γ

Slide 60

Slide 60 text

Large-Margin Loss How to pick the margin? Option 1: Heuristic constant γ

Slide 61

Slide 61 text

Large-Margin Loss How to pick the margin? Option 1: Heuristic constant Option 2: Tune on validation set γ

Slide 62

Slide 62 text

Large-Margin Loss How to pick the margin? Option 1: Heuristic constant Option 2: Tune on validation set Option 3: Learn from data γ

Slide 63

Slide 63 text

Large-Margin Loss How to pick the margin? Option 1: Heuristic constant Option 2: Tune on validation set Option 3: Learn from data Our approach: Dynamic margins γ

Slide 64

Slide 64 text

Large-Margin Loss γ (u, v, v′ )

Slide 65

Slide 65 text

Large-Margin Loss γ (u, v, v′ ) = shortest path distance (v, v′ )

Slide 66

Slide 66 text

Large-Margin Loss γ (u, v, v′ ) = shortest path distance (v, v′ ) If, Proposition

Slide 67

Slide 67 text

Large-Margin Loss γ (u, v, v′ ) = shortest path distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition

Slide 68

Slide 68 text

Large-Margin Loss γ (u, v, v′ ) = shortest path distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition True parent Predicted parent

Slide 69

Slide 69 text

loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) (u, v, v′ ) = shortest path distance (v, v′ ) Large-Margin Loss γ If, Proposition q True Parent Easy fix Hard fix Easier human verification!

Slide 70

Slide 70 text

Large-Margin Loss Infeasible to sample all non-parents v′

Slide 71

Slide 71 text

Large-Margin Loss Negative sampling Infeasible to sample all non-parents v′

Slide 72

Slide 72 text

Large-Margin Loss Negative sampling Loss rapidly drops to 0 — no “active” samples Infeasible to sample all non-parents v′

Slide 73

Slide 73 text

Large-Margin Loss Negative sampling Loss rapidly drops to 0 — no “active” samples Distance-weighted sampling [Wu et. al., 2017] Infeasible to sample all non-parents v′ egrades to similar performance as CRIM on the S￿￿E￿￿￿ with homogeneous edge semantics. eports the top-ranked predicted parents by A￿￿￿￿￿￿￿ ￿￿ for both accurately and inaccurately-predicted test e parents are emphasized in bold). The results showcase on a variety of node-types present in the P￿￿￿￿￿￿￿￿ from concrete entities such as locations (Luxor) and aracters (Thor) to abstract concepts such as depression. that even inaccurately-predicted parents conform to n of relatedness and immediate hierarchy, suggesting missing edges in the taxonomy. howcase A￿￿￿￿￿￿￿’s predictions for search queries made t that are not present in the taxonomy (Table 5, bottom). y, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- e queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ Of note is the search query what causes blackheads, t just associated with its obvious parent skin concern, he very relevant parent feelings. ation Study mance of A￿￿￿￿￿￿￿ may be attributed to two key model- : (i) learning node-speci￿c embeddings w to capture ous edge semantics, and (ii) optimizing a large-margin (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling

Slide 74

Slide 74 text

Implementation Details CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io

Slide 75

Slide 75 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 76

Slide 76 text

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic Margins 3. Evaluation

Slide 77

Slide 77 text

Datasets 3 Textual Taxonomies Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 78

Slide 78 text

Datasets Pinterest Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 79

Slide 79 text

Datasets Pinterest Heterogenous semantics Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 80

Slide 80 text

Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York) or abstract (Mental Wellbeing) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 81

Slide 81 text

Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York) or abstract (Mental Wellbeing) PinText embeddings used for each node [Zhuang and Liu, 2019] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 82

Slide 82 text

Datasets SemEval Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 83

Slide 83 text

Datasets SemEval From the SemEval 2018 hypernym discovery task Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 84

Slide 84 text

Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous “is-a” semantics Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 85

Slide 85 text

Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous “is-a” semantics FastText embeddings used for each node [Bojanowski et. al., 2017] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 86

Slide 86 text

Datasets Mammal Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 87

Slide 87 text

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 88

Slide 88 text

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge types: is-a, is- part-of-whole, is-part-of- substance Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 89

Slide 89 text

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge types: is-a, is- part-of-whole, is-part-of- substance FastText embeddings used for each node Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 90

Slide 90 text

Datasets Evaluation Setup 15% of leaf nodes + outgoing edges held out for testing Remaining child-parent pairs used for training Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 91

Slide 91 text

Datasets Metrics • Mean Reciprocal Rank (MRR): From 0% to 100% (best) • Recall@15: From 0% to 100% (best) • Mean shortest-path distance (SPDist) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Slide 92

Slide 92 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 93

Slide 93 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 94

Slide 94 text

Hypernym Detectors Q. Can hypernym detectors be repurposed for taxonomy expansion? [Baroni et. al., 2012; Roller et. al., 2014; Weeds et. al., 2014; Shwartz et. al., 2016]

Slide 95

Slide 95 text

Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores

Slide 96

Slide 96 text

Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier

Slide 97

Slide 97 text

Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier Trained and tested on a balanced sample of node-pairs

Slide 98

Slide 98 text

Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Slide 99

Slide 99 text

Hypernym Detectors 1. Reasonably good performance overall F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Slide 100

Slide 100 text

Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings correlated with better performance F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Slide 101

Slide 101 text

Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings correlated with better performance 3. No single dominant hypernym detector F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Slide 102

Slide 102 text

Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0% SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks

Slide 103

Slide 103 text

Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0% SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance

Slide 104

Slide 104 text

Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0% SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance Explicit formulation of taxonomy expansion as ranking needed

Slide 105

Slide 105 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 106

Slide 106 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 107

Slide 107 text

Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3% This Work 59.0% 43.4% 29.4% Q. Does explicitly accommodating heterogenous edge semantics help?

Slide 108

Slide 108 text

Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3% This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Models homogenous edge semantics Skip-gram-negative-sampling- like loss function

Slide 109

Slide 109 text

Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3% This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method has better ranking performance when taxonomy has heterogenous edge semantics

Slide 110

Slide 110 text

Taxonomy Expansion SPDist Pinterest SemEval Mammal CRIM 2.4 2.7 4.1 This Work 2.2 2.9 3.2 Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method predicts parents that are closer to the true parents for taxonomies with heterogenous edge semantics

Slide 111

Slide 111 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 112

Slide 112 text

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III. Example Predictions on Pinterest

Slide 113

Slide 113 text

Example Predictions Example results on Pinterest, correct parents in bold Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Slide 114

Slide 114 text

Example Predictions Concrete concept, “is-in” semantics Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Slide 115

Slide 115 text

Example Predictions Abstract concept, “is-type-of” semantics Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Slide 116

Slide 116 text

Example Predictions Example results on Pinterest, correct parents in bold Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Slide 117

Slide 117 text

Example failures on Pinterest (no correct parent in top 4 predictions) Query Predicted Parents artificial flowers planting, dried flowers, DIY flowers, edible seeds thor adventure movie, action movie, science movie, adventure games smartwatch wearable devices, phone accessories, electronics, computer disney makeup halloween makeup, makeup, costume makeup, character makeup holocaust history, german history, american history, world war Example Failures

Slide 118

Slide 118 text

Test for Data Leakage Predictions for Pinterest search queries not present in the taxonomy Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies

Slide 119

Slide 119 text

Predictions for Pinterest search queries not present in the taxonomy Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies Test for Data Leakage

Slide 120

Slide 120 text

More in Paper Expanding Taxonomies with Implicit Edge Semantics WWW ’20, April 20–24, 2020, Taipei, Taiwan methods for 150 epochs (for P￿￿￿￿￿￿￿￿ and S￿￿E￿￿￿) or 500 epochs (for M￿￿￿￿￿) and select the trained model at the epoch with the highest validation MRR (see appendix for details). Taxonomy expansion results are reported in Table 4. Overall, A￿￿￿￿￿￿￿ and CRIM improve over the hypernym detectors on all datasets and evaluation metrics, by over 200% in some cases. This justi￿es explicitly optimizing for the taxonomy expansion ranking task, and representing taxonomic relationships with more complex functions of the node-pair feature-vectors. A￿￿￿￿￿￿￿ outperforms CRIM on all datasets and evaluation metrics. Notably, A￿￿￿￿￿￿￿ gracefully degrades to similar performance as CRIM on the S￿￿E￿￿￿ taxonomy with homogeneous edge semantics. Table 5 reports the top-ranked predicted parents by A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿ for both accurately and inaccurately-predicted test queries (true parents are emphasized in bold). The results showcase predictions on a variety of node-types present in the P￿￿￿￿￿￿￿￿ taxonomy, from concrete entities such as locations (Luxor) and ￿ctional characters (Thor) to abstract concepts such as depression. We observe that even inaccurately-predicted parents conform to some notion of relatedness and immediate hierarchy, suggesting potentially missing edges in the taxonomy. We also showcase A￿￿￿￿￿￿￿’s predictions for search queries made on Pinterest that are not present in the taxonomy (Table 5, bottom). Qualitatively, A￿￿￿￿￿￿￿ is able to accurately associate unseen natu- ral language queries to potentially related nodes in the P￿￿￿￿￿￿￿￿ taxonomy. Of note is the search query what causes blackheads, which is not just associated with its obvious parent skin concern, but also to the very relevant parent feelings. 5.4 Ablation Study The performance of A￿￿￿￿￿￿￿ may be attributed to two key model- ing choices: (i) learning node-speci￿c embeddings w to capture heterogeneous edge semantics, and (ii) optimizing a large-margin (a) Summary of ablation study (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling Figure 2: Ablation study of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿: (a) sum- WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec Figure 3: E￿ect of the number of linear maps k (top-left), the number of negative samples m (top-right) and the train- ing data fraction (bottom-left) on the MRR of A￿￿￿￿￿￿￿ on P￿￿￿￿￿￿￿￿. Also shown (bottom-right) is the average undi- rected shortest-path distance between predicted and true test parents (SPDist) with training epoch. Ablation study Impact of hyperparameters Inferring taxonomic roles

Slide 121

Slide 121 text

Summary

Slide 122

Slide 122 text

Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop verification

Slide 123

Slide 123 text

Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop verification Taxonomic Roles with Linear Maps ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$

Slide 124

Slide 124 text

Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa

Slide 125

Slide 125 text

Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins Guarantees to Ease Human Verification ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3   3  3 . [ Z 1%$ and (u, , ) is the desired margin de￿ned as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satis￿ed, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A￿￿￿￿￿ node, the taxonom around the predic ￿nd the correct pa learned from the data [19]. We propose a principled dynamic margin func no tuning, learning or heuristics. We relate th shortest-path distances in the taxonomy between true parent nodes. Denote by d(·, ·) the undirec distance between two nodes in the taxonomy. W theorem, we bound the undirected shortest-path the highest-ranked predicted parent ˆ(u) = arg any true parent for every child node u: P￿￿￿￿￿￿￿￿￿￿ 1. When (u, , 0) = d( , 0), L bound on the sum of the undirected shortest-path the highest-ranked predicted parents and true par ’ (u, )2E d( , ˆ(u))  L(T).

Slide 126

Slide 126 text

126 CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io [email protected]