Expanding Taxonomies with Implicit Edge Semantics

Expanding Taxonomies with Implicit Edge Semantics Emaad Manzoor Dhananjay Shrouty
Rui Li Jure Leskovec

Taxonomies Collection of related concepts

Taxonomies Geographic entities Collection of related concepts Asia S. Asia
E. Asia India Pakistan Nepal Mumbai Delhi maps.google.com

Taxonomies Geographic entities Musical genres Collection of related concepts Rock
Punk Alternative Grunge Rapcore Indie Instrumental Vocal musicmap.info

Taxonomies Geographic entities Musical genres Product categories Collection of related
concepts Apparel Clothing Jewelry Active Sleep Formal Cycling Football

Taxonomies Collection of related concepts structured as a directed graph

PARENT CHILD

encoding a hierarchy PARENT CHILD

encoding a hierarchy PARENT (i) related to, and (ii) more general than CHILD [Distributional Inclusion Hypothesis; Geffet & Dagan, ACL 2005]

Taxonomies Additional Assumption Each concept has a non- taxonomic feature
vector Word embeddings Image embeddings

Taxonomies Help improve performance in: • Classification [Babbar et. al.,
2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]

The Pinterest Taxonomy Help improve performance in: • Classification [Babbar
et. al., 2013] • Recommendations [He et. al., 2016] • Search [Agrawal et. al., 2009] • User modeling [Menon et. al., 2011]

The Pinterest Taxonomy Hierarchy of interests ~11,000 nodes/edges 7 levels
deep 100% expert curated

100% expert curated

100% expert curated Rafael S Gonçalves, Matthew Horridge, Rui Li,
Yu Liu, Mark A Musen, Csongor I Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. Use of OWL and Semantic Web Technologies at Pinterest. International Semantic Web Conference, 2019. [Best Paper, In-Use Track] 8 curators 1 month 6,000 nodes

Problem

Problem Given a taxonomy

Problem Given a taxonomy with node feature vectors

Problem Given a taxonomy with node feature vectors and unseen
query node q q

Problem q Rank the taxonomy nodes such that true parents
of are ranked high q Given a taxonomy with node feature vectors and unseen query node q

Easy Human Verification Want predicted parents near true parents —
quantified by shortest- path distance from top- ranked prediction q Problem

Challenges

Challenges Lexical memorization Omer Levy, Steffen Remus, Chris Biemann, and
Ido Dagan. Do supervised distributional methods really learn lexical inference relations?. NAACL-HLT 2015.

Challenges Lexical memorization Edge semantics are heterogenous

Challenges Lexical memorization Edge semantics are heterogenous Paris France Is-in

Challenges Lexical memorization Ronaldo Sportsman Is-a Paris France Is-in Edge
semantics are heterogenous

semantics are heterogenous and unobserved

semantics are heterogenous and unobserved Want to learn these semantics from the natural organization used by taxonomists to serve business needs

Challenges Need predictions for humans q True Parent Easy ﬁx
Hard ﬁx Edge semantics are heterogenous and unobserved Lexical memorization Query

Outline 1. Modeling Taxonomic Relatedness 2. Learning, Prediction & Dynamic
Margins 3. Evaluation

parent v child u eu ev Taxonomic Relatedness Node Feature
Vectors

parent v child u Taxonomic Relatedness Relatedness score s(u, v)
eu ev Node Feature Vectors

parent v child u Relatedness score s(u, v) s(u, v)
= (eu M) ⋅ ev Taxonomic Relatedness eu ev Node Feature Vectors

parent v child u Relatedness score s(u, v) s(u, v)
= (eu M) ⋅ ev Taxonomic Relatedness Learn from data eu ev Node Feature Vectors

parent v child u s(u, v) = (eu M) ⋅
ev Assumes homogenous edge semantics Taxonomic Relatedness Relatedness score s(u, v) eu ev Node Feature Vectors

s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors

s(u, v) = (eu Mv ) ⋅ ev Node-local linear map eu ev Node Feature Vectors

s(u, v) = (eu Mv ) ⋅ ev parameters O(d2 |V|) eu ev Node Feature Vectors

s(u, v) = (eu Mv ) ⋅ ev eu ev Node Feature Vectors

s(u, v) = (eu Mv ) ⋅ ev Mv = k ∑ i=1 wv [i] × Pi eu ev Node Feature Vectors

Taxonomic Relatedness Mv = k ∑ i=1 wv [i] ×
Pi

Pi Transformation matrix of node v

Pi latent edge semantics k Transformation matrix of node v

Pi Transformation matrix of node v latent edge semantics k Linear map for semantic type i

Taxonomic Relatedness Mv = k ∑ i=1 [i] × Pi
latent edge semantics k Taxonomic “role” of parent v wv Linear map for semantic type i Transformation matrix of node v

Taxonomic Relatedness = f(ev ) Taxonomic “role” of parent v
is any learnable function f wv

Taxonomic Relatedness Relatedness score s(u, v) s(u, v) = (eu
Mv ) ⋅ ev Mv = k ∑ i=1 [i] × Pi wv

Mv ) ⋅ ev To learn: P1 , …, Pk f : ℝd → ℝk Mv = k ∑ i=1 [i] × Pi wv

Mv ) ⋅ ev parameters Information-sharing across nodes Robust to noise O(d2k + |f |) Mv = k ∑ i=1 [i] × Pi wv

Large-Margin Loss

Large-Margin Loss Desired constraint s(child, parent) > s(child,nonparent) + γ

Large-Margin Loss s(child, parent) _ Violated constraint > s(child,nonparent) +
γ

Large-Margin Loss s(child, parent) − Constraint violation [ ]+ s(child,nonparent)
+ γ

Large-Margin Loss Loss function u, v u, v′ [ ]+
− s( ) s( ) ∑ (u,v,v′ ) + γ

Large-Margin Loss How to pick the margin? γ

Large-Margin Loss How to pick the margin? Option 1: Heuristic
constant γ

constant Option 2: Tune on validation set γ

constant Option 2: Tune on validation set Option 3: Learn from data γ

constant Option 2: Tune on validation set Option 3: Learn from data Our approach: Dynamic margins γ

Large-Margin Loss γ (u, v, v′ )

Large-Margin Loss γ (u, v, v′ ) = shortest path
distance (v, v′ )

distance (v, v′ ) If, Proposition

distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition

distance (v, v′ ) loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u)) If, Proposition True parent Predicted parent

loss ≥ ∑ (u,v) shortest path distance (v, ̂ v(u))
(u, v, v′ ) = shortest path distance (v, v′ ) Large-Margin Loss γ If, Proposition q True Parent Easy ﬁx Hard ﬁx Easier human verification!

Large-Margin Loss Infeasible to sample all non-parents v′

Large-Margin Loss Negative sampling Infeasible to sample all non-parents v′

Large-Margin Loss Negative sampling Loss rapidly drops to 0 —
no “active” samples Infeasible to sample all non-parents v′

Large-Margin Loss Negative sampling Loss rapidly drops to 0 —
no “active” samples Distance-weighted sampling [Wu et. al., 2017] Infeasible to sample all non-parents v′ egrades to similar performance as CRIM on the SE with homogeneous edge semantics. eports the top-ranked predicted parents by A for both accurately and inaccurately-predicted test e parents are emphasized in bold). The results showcase on a variety of node-types present in the P from concrete entities such as locations (Luxor) and aracters (Thor) to abstract concepts such as depression. that even inaccurately-predicted parents conform to n of relatedness and immediate hierarchy, suggesting missing edges in the taxonomy. howcase A’s predictions for search queries made t that are not present in the taxonomy (Table 5, bottom). y, A is able to accurately associate unseen natu- e queries to potentially related nodes in the P Of note is the search query what causes blackheads, t just associated with its obvious parent skin concern, he very relevant parent feelings. ation Study mance of A may be attributed to two key model- : (i) learning node-specic embeddings w to capture ous edge semantics, and (ii) optimizing a large-margin (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling

Implementation Details CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io

Datasets 3 Textual Taxonomies Pinterest SemEval Mammal No. of edges
10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Pinterest Pinterest SemEval Mammal No. of edges 10768 18827
5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Pinterest Heterogenous semantics Pinterest SemEval Mammal No. of edges
10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)
or abstract (Mental Wellbeing) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Pinterest Heterogenous semantics Nodes can be concrete (New York)
or abstract (Mental Wellbeing) PinText embeddings used for each node [Zhuang and Liu, 2019] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets SemEval Pinterest SemEval Mammal No. of edges 10768 18827

Datasets SemEval From the SemEval 2018 hypernym discovery task Pinterest
SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous
“is-a” semantics Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets SemEval From the SemEval 2018 hypernym discovery task Homogenous
“is-a” semantics FastText embeddings used for each node [Bojanowski et. al., 2017] Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Mammal Pinterest SemEval Mammal No. of edges 10768 18827

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 Pinterest SemEval
Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge
types: is-a, is- part-of-whole, is-part-of- substance Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Mammal WordNet noun subgraph rooted at mammal.n.01 3 edge
types: is-a, is- part-of-whole, is-part-of- substance FastText embeddings used for each node Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Evaluation Setup 15% of leaf nodes + outgoing edges
held out for testing Remaining child-parent pairs used for training Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Datasets Metrics • Mean Reciprocal Rank (MRR): From 0% to
100% (best) • Recall@15: From 0% to 100% (best) • Mean shortest-path distance (SPDist) Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓

Evaluation I. Repurposing hypernym detectors II. Taxonomy expansion performance III.
Example Predictions on Pinterest

Hypernym Detectors Q. Can hypernym detectors be repurposed for taxonomy
expansion? [Baroni et. al., 2012; Roller et. al., 2014; Weeds et. al., 2014; Shwartz et. al., 2016]

Hypernym Detectors F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1%
SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores

SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier

SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier Trained and tested on a balanced sample of node-pairs

SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Hypernym Detectors 1. Reasonably good performance overall F1 Pinterest SemEval
Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings
correlated with better performance F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Hypernym Detectors 1. Reasonably good performance overall 2. Better embeddings
correlated with better performance 3. No single dominant hypernym detector F1 Pinterest SemEval Mammal CONCAT 86.5% 59.3% 72.1% SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0%

Hypernym Detectors MRR Pinterest SemEval Mammal CONCAT 41.8% 21.0% 15.0%
SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks

SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance

SUM 33.9% 17.8% 19.6% DIFF 41.2% 18.5% 31.4% PROD 42.2% 17.5% 32.2% Mean Reciprocal Ranks Uncorrelated with classification performance Explicit formulation of taxonomy expansion as ranking needed

Taxonomy Expansion MRR Pinterest SemEval Mammal CRIM 53.2% 41.7% 21.3%
This Work 59.0% 43.4% 29.4% Q. Does explicitly accommodating heterogenous edge semantics help?

This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Models homogenous edge semantics Skip-gram-negative-sampling- like loss function

This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method has better ranking performance when taxonomy has heterogenous edge semantics

Taxonomy Expansion SPDist Pinterest SemEval Mammal CRIM 2.4 2.7 4.1
This Work 2.2 2.9 3.2 Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method predicts parents that are closer to the true parents for taxonomies with heterogenous edge semantics

Example Predictions Example results on Pinterest, correct parents in bold
Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Example Predictions Concrete concept, “is-in” semantics Query Predicted Parents luxor
africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Example Predictions Abstract concept, “is-type-of” semantics Query Predicted Parents luxor
africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Example Predictions Example results on Pinterest, correct parents in bold
Query Predicted Parents luxor africa travel, european travel, asia travel, greece 2nd month baby baby stage, baby, baby names, preparing for baby depression mental illness, stress, mental wellbeing, disease ramadan hosting occasions, holiday, sukkot, middle east & african cuisine minion humor humor, people humor, character humor, funny

Example failures on Pinterest (no correct parent in top 4
predictions) Query Predicted Parents artificial flowers planting, dried flowers, DIY flowers, edible seeds thor adventure movie, action movie, science movie, adventure games smartwatch wearable devices, phone accessories, electronics, computer disney makeup halloween makeup, makeup, costume makeup, character makeup holocaust history, german history, american history, world war Example Failures

Test for Data Leakage Predictions for Pinterest search queries not
present in the taxonomy Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies

Predictions for Pinterest search queries not present in the taxonomy
Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies Test for Data Leakage

More in Paper Expanding Taxonomies with Implicit Edge Semantics WWW
’20, April 20–24, 2020, Taipei, Taiwan methods for 150 epochs (for P and SE) or 500 epochs (for M) and select the trained model at the epoch with the highest validation MRR (see appendix for details). Taxonomy expansion results are reported in Table 4. Overall, A and CRIM improve over the hypernym detectors on all datasets and evaluation metrics, by over 200% in some cases. This justies explicitly optimizing for the taxonomy expansion ranking task, and representing taxonomic relationships with more complex functions of the node-pair feature-vectors. A outperforms CRIM on all datasets and evaluation metrics. Notably, A gracefully degrades to similar performance as CRIM on the SE taxonomy with homogeneous edge semantics. Table 5 reports the top-ranked predicted parents by A on P for both accurately and inaccurately-predicted test queries (true parents are emphasized in bold). The results showcase predictions on a variety of node-types present in the P taxonomy, from concrete entities such as locations (Luxor) and ctional characters (Thor) to abstract concepts such as depression. We observe that even inaccurately-predicted parents conform to some notion of relatedness and immediate hierarchy, suggesting potentially missing edges in the taxonomy. We also showcase A’s predictions for search queries made on Pinterest that are not present in the taxonomy (Table 5, bottom). Qualitatively, A is able to accurately associate unseen natural language queries to potentially related nodes in the P taxonomy. Of note is the search query what causes blackheads, which is not just associated with its obvious parent skin concern, but also to the very relevant parent feelings. 5.4 Ablation Study The performance of A may be attributed to two key modeling choices: (i) learning node-specic embeddings w to capture heterogeneous edge semantics, and (ii) optimizing a large-margin (a) Summary of ablation study (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling Figure 2: Ablation study of A on P: (a) sum- WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec Figure 3: Eect of the number of linear maps k (top-left), the number of negative samples m (top-right) and the training data fraction (bottom-left) on the MRR of A on P. Also shown (bottom-right) is the average undirected shortest-path distance between predicted and true test parents (SPDist) with training epoch. Ablation study Impact of hyperparameters Inferring taxonomic roles

Summary

Summary Expand taxonomies with heterogenous, unobserved edge semantics for human-in-the-loop
verification

verification Taxonomic Roles with Linear Maps ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$

verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$ and (u, , ) is the desired margin dened as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satised, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, predictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A node, the taxonom around the predic nd the correct pa

verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins Guarantees to Ease Human Verification ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$ and (u, , ) is the desired margin dened as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satised, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, predictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A node, the taxonom around the predic nd the correct pa learned from the data [19]. We propose a principled dynamic margin func no tuning, learning or heuristics. We relate th shortest-path distances in the taxonomy between true parent nodes. Denote by d(·, ·) the undirec distance between two nodes in the taxonomy. W theorem, we bound the undirected shortest-path the highest-ranked predicted parent ˆ(u) = arg any true parent for every child node u: P 1. When (u, , 0) = d( , 0), L bound on the sum of the undirected shortest-path the highest-ranked predicted parents and true par ’ (u, )2E d( , ˆ(u))  L(T).

126 CODE / RESOURCES / SLIDES / VIDEO cmuarborist.github.io [email protected]

Expanding Taxonomies with Implicit Edge Semantics

Expanding Taxonomies with Implicit Edge Semantics

More Decks by Emaad Manzoor

Other Decks in Science

Featured

Transcript