Yu Liu, Mark A Musen, Csongor I Nyulas, Evelyn Obamos, Dhananjay Shrouty, and David Temple. Use of OWL and Semantic Web Technologies at Pinterest. International Semantic Web Conference, 2019. [Best Paper, In-Use Track] 8 curators 1 month 6,000 nodes
no “active” samples Distance-weighted sampling [Wu et. al., 2017] Infeasible to sample all non-parents v′ egrades to similar performance as CRIM on the SE with homogeneous edge semantics. eports the top-ranked predicted parents by A for both accurately and inaccurately-predicted test e parents are emphasized in bold). The results showcase on a variety of node-types present in the P from concrete entities such as locations (Luxor) and aracters (Thor) to abstract concepts such as depression. that even inaccurately-predicted parents conform to n of relatedness and immediate hierarchy, suggesting missing edges in the taxonomy. howcase A’s predictions for search queries made t that are not present in the taxonomy (Table 5, bottom). y, A is able to accurately associate unseen natu- e queries to potentially related nodes in the P Of note is the search query what causes blackheads, t just associated with its obvious parent skin concern, he very relevant parent feelings. ation Study mance of A may be attributed to two key model- : (i) learning node-specic embeddings w to capture ous edge semantics, and (ii) optimizing a large-margin (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling
held out for testing Remaining child-parent pairs used for training Pinterest SemEval Mammal No. of edges 10768 18827 5765 No. of nodes 10792 8154 5080 Training Nodes 7919 7374 4543 Test nodes 2873 780 537 Depth 7 ∞ 18 Heterogenous Semantics ✓ ✗ ✓
SUM 87.7% 60.6% 77.2% DIFF 87.0% 63.4% 75.7% PROD 86.0% 65.7% 78.0% Classification F1 scores Vector operation + random forest classifier Trained and tested on a balanced sample of node-pairs
This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Models homogenous edge semantics Skip-gram-negative-sampling- like loss function
This Work 59.0% 43.4% 29.4% Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method has better ranking performance when taxonomy has heterogenous edge semantics
This Work 2.2 2.9 3.2 Comparison with CRIM [Bernier-Colborne & Barriere, 2018] Our method predicts parents that are closer to the true parents for taxonomies with heterogenous edge semantics
Query Predicted Parents what causes blackheads skin concern, mental illness, feelings, disease meatloaf cupcakes cupcakes, dessert, no bake meals, steak benefits of raw carrots food and drinks, vegetables, diet, healthy recipes kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby humorous texts poems, quotes, authors, religious studies Test for Data Leakage
’20, April 20–24, 2020, Taipei, Taiwan methods for 150 epochs (for P and SE) or 500 epochs (for M) and select the trained model at the epoch with the highest validation MRR (see appendix for details). Taxonomy expansion results are reported in Table 4. Overall, A and CRIM improve over the hypernym detectors on all datasets and evaluation metrics, by over 200% in some cases. This justies explicitly optimizing for the taxonomy expansion ranking task, and representing taxonomic relationships with more complex functions of the node-pair feature-vectors. A outperforms CRIM on all datasets and evaluation metrics. Notably, A gracefully degrades to similar performance as CRIM on the SE taxonomy with homogeneous edge semantics. Table 5 reports the top-ranked predicted parents by A on P for both accurately and inaccurately-predicted test queries (true parents are emphasized in bold). The results showcase predictions on a variety of node-types present in the P taxonomy, from concrete entities such as locations (Luxor) and ctional characters (Thor) to abstract concepts such as depression. We observe that even inaccurately-predicted parents conform to some notion of relatedness and immediate hierarchy, suggesting potentially missing edges in the taxonomy. We also showcase A’s predictions for search queries made on Pinterest that are not present in the taxonomy (Table 5, bottom). Qualitatively, A is able to accurately associate unseen natu- ral language queries to potentially related nodes in the P taxonomy. Of note is the search query what causes blackheads, which is not just associated with its obvious parent skin concern, but also to the very relevant parent feelings. 5.4 Ablation Study The performance of A may be attributed to two key model- ing choices: (i) learning node-specic embeddings w to capture heterogeneous edge semantics, and (ii) optimizing a large-margin (a) Summary of ablation study (b) MRR for each value of constant margin vs. dynamic margins (c) Uniform vs. distance-weighted negative-sampling Figure 2: Ablation study of A on P: (a) sum- WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec Figure 3: Eect of the number of linear maps k (top-left), the number of negative samples m (top-right) and the train- ing data fraction (bottom-left) on the MRR of A on P. Also shown (bottom-right) is the average undi- rected shortest-path distance between predicted and true test parents (SPDist) with training epoch. Ablation study Impact of hyperparameters Inferring taxonomic roles
verification Taxonomic Roles with Linear Maps ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$
verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$ and (u, , ) is the desired margin dened as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satised, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A node, the taxonom around the predic nd the correct pa
verification Taxonomic Roles with Linear Maps Large-Margin Loss with Dynamic Margins Guarantees to Ease Human Verification ge Semantics Jure Leskovec interest, Stanford University @{pinterest.com,cs.stanford.edu} %$// 1%$ 6+$4 4XHU\T H 1%$ H T 0 1%$ ,67<3(2) ,63/$<(52) ,6/($*8(2) ,03/,&,7('*(6(0$17,&6 [ V T1%$ 3 3 3 . [ Z 1%$ and (u, , ) is the desired margin dened as a function of the child, parent and non-parent nodes. We now derive the loss function to be minimized in order to satisfy the large-margin constraint (5). Denote by E(u, , 0) the degree to which a non-parent node 0 violates the large-margin constraint of child-parent pair (u, ): E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6) When the large-margin constraint is satised, E(u, , 0) = 0 and the non-parent incurs no violation. Otherwise, E(u, , 0) > 0. The overall loss function L(T) is the total violation of the large- margin constraints by the non-parents corresponding to every child-parent pair (u, ): L(T) = ’ (u, )2E ’ 0 2V H(u) E(u, , 0) (7) The node embeddings w and linear-maps P1, . . . , Pk are jointly trained to minimize L(T) via gradient-descent. Given the trained parameters and a query node q < V having feature-vector eq, pre- dictions are made by ranking the taxonomy nodes in decreasing order of their taxonomic relatedness s(q, ). Using the fact that pairs and their cor L(T) Thus, minimizin on the sum of shor predictions and tr dicted parent node truth taxonomy. In ages non-parent n be scored relatively This guarantee experts; if A node, the taxonom around the predic nd the correct pa learned from the data [19]. We propose a principled dynamic margin func no tuning, learning or heuristics. We relate th shortest-path distances in the taxonomy between true parent nodes. Denote by d(·, ·) the undirec distance between two nodes in the taxonomy. W theorem, we bound the undirected shortest-path the highest-ranked predicted parent ˆ(u) = arg any true parent for every child node u: P 1. When (u, , 0) = d( , 0), L bound on the sum of the undirected shortest-path the highest-ranked predicted parents and true par ’ (u, )2E d( , ˆ(u)) L(T).