Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Expanding Taxonomies with Implicit Edge Semantics

Expanding Taxonomies with Implicit Edge Semantics

25 minute talk at The Web Conference (WWW) 2020.
Project website: http://cmuarborist.github.io

Emaad Manzoor

April 23, 2020
Tweet

More Decks by Emaad Manzoor

Other Decks in Science

Transcript

  1. Expanding
    Taxonomies with
    Implicit Edge
    Semantics
    Emaad Manzoor
    Dhananjay Shrouty
    Rui Li
    Jure Leskovec

    View Slide

  2. Expanding
    Taxonomies with
    Implicit Edge
    Semantics
    Emaad Manzoor
    Dhananjay Shrouty
    Rui Li
    Jure Leskovec

    View Slide

  3. Taxonomies
    Collection of related concepts

    View Slide

  4. Taxonomies
    Geographic entities
    Collection of related concepts
    Asia
    S. Asia E. Asia
    India Pakistan Nepal
    Mumbai Delhi
    maps.google.com

    View Slide

  5. Taxonomies
    Geographic entities
    Musical genres
    Collection of related concepts
    Rock
    Punk Alternative
    Grunge Rapcore Indie
    Instrumental Vocal
    musicmap.info

    View Slide

  6. Taxonomies
    Geographic entities
    Musical genres
    Product categories
    Collection of related concepts
    Apparel
    Clothing Jewelry
    Active Sleep Formal
    Cycling Football

    View Slide

  7. Taxonomies
    Collection of related concepts
    structured as a directed graph

    View Slide

  8. Taxonomies
    Collection of related concepts
    structured as a directed graph
    PARENT CHILD

    View Slide

  9. Taxonomies
    Collection of related concepts
    structured as a directed graph
    encoding a hierarchy
    PARENT CHILD

    View Slide

  10. Taxonomies
    Collection of related concepts
    structured as a directed graph
    encoding a hierarchy
    PARENT
    (i) related to, and
    (ii) more general than
    CHILD
    [Distributional Inclusion Hypothesis;
    Geffet & Dagan, ACL 2005]

    View Slide

  11. Taxonomies
    Additional Assumption
    Each concept has a non-
    taxonomic feature vector
    Word embeddings
    Image embeddings

    View Slide

  12. Taxonomies
    Help improve performance in:
    • Classification [Babbar et. al., 2013]
    • Recommendations [He et. al., 2016]
    • Search [Agrawal et. al., 2009]
    • User modeling [Menon et. al., 2011]

    View Slide

  13. The Pinterest
    Taxonomy
    Help improve performance in:
    • Classification [Babbar et. al., 2013]
    • Recommendations [He et. al., 2016]
    • Search [Agrawal et. al., 2009]
    • User modeling [Menon et. al., 2011]

    View Slide

  14. The Pinterest
    Taxonomy
    Hierarchy of interests
    ~11,000 nodes/edges
    7 levels deep
    100% expert curated

    View Slide

  15. 100% expert curated

    View Slide

  16. 100% expert curated
    Rafael S Gonçalves, Matthew Horridge, Rui Li, Yu
    Liu, Mark A Musen, Csongor I Nyulas, Evelyn
    Obamos, Dhananjay Shrouty, and David Temple.
    Use of OWL and Semantic Web Technologies
    at Pinterest.
    International Semantic Web Conference, 2019.
    [Best Paper, In-Use Track]
    8 curators
    1 month
    6,000 nodes

    View Slide

  17. Problem

    View Slide

  18. Problem
    Given a taxonomy

    View Slide

  19. Problem
    Given a taxonomy with
    node feature vectors

    View Slide

  20. Problem
    Given a taxonomy with
    node feature vectors and
    unseen query node q
    q

    View Slide

  21. Problem
    q
    Rank the taxonomy nodes
    such that true parents of
    are ranked high
    q
    Given a taxonomy with
    node feature vectors and
    unseen query node q

    View Slide

  22. Easy Human Verification
    Want predicted parents
    near true parents —
    quantified by shortest-
    path distance from top-
    ranked prediction
    q
    Problem

    View Slide

  23. Challenges

    View Slide

  24. Challenges
    Lexical memorization Omer Levy, Steffen Remus, Chris
    Biemann, and Ido Dagan. Do
    supervised distributional methods
    really learn lexical inference
    relations?. NAACL-HLT 2015.

    View Slide

  25. Challenges
    Lexical memorization
    Edge semantics are
    heterogenous

    View Slide

  26. Challenges
    Lexical memorization
    Edge semantics are
    heterogenous
    Paris
    France
    Is-in

    View Slide

  27. Challenges
    Lexical memorization
    Ronaldo
    Sportsman
    Is-a
    Paris
    France
    Is-in
    Edge semantics are
    heterogenous

    View Slide

  28. Challenges
    Lexical memorization
    Ronaldo
    Sportsman
    Is-a
    Paris
    France
    Is-in
    Edge semantics are
    heterogenous and
    unobserved

    View Slide

  29. Challenges
    Lexical memorization
    Ronaldo
    Sportsman
    Is-a
    Paris
    France
    Is-in
    Edge semantics are
    heterogenous and
    unobserved
    Want to learn these semantics from
    the natural organization used by
    taxonomists to serve business needs

    View Slide

  30. Challenges
    Need predictions
    for humans
    q
    True Parent
    Easy fix
    Hard fix
    Edge semantics are
    heterogenous and
    unobserved
    Lexical memorization
    Query

    View Slide

  31. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  32. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  33. parent v
    child u
    eu
    ev
    Taxonomic
    Relatedness
    Node
    Feature
    Vectors

    View Slide

  34. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  35. parent v
    child u
    Relatedness score s(u, v)
    s(u, v) = (eu
    M) ⋅ ev
    Taxonomic
    Relatedness
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  36. parent v
    child u
    Relatedness score s(u, v)
    s(u, v) = (eu
    M) ⋅ ev
    Taxonomic
    Relatedness
    Learn from data
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  37. parent v
    child u
    s(u, v) = (eu
    M) ⋅ ev
    Assumes homogenous
    edge semantics
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  38. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  39. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    Node-local linear map
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  40. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    parameters
    O(d2 |V|) eu
    ev
    Node
    Feature
    Vectors

    View Slide

  41. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  42. parent v
    child u
    Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    Mv
    =
    k

    i=1
    wv
    [i] × Pi
    eu
    ev
    Node
    Feature
    Vectors

    View Slide

  43. Taxonomic
    Relatedness
    Mv
    =
    k

    i=1
    wv
    [i] × Pi

    View Slide

  44. Taxonomic
    Relatedness
    Mv
    =
    k

    i=1
    wv
    [i] × Pi
    Transformation
    matrix of node v

    View Slide

  45. Taxonomic
    Relatedness
    Mv
    =
    k

    i=1
    wv
    [i] × Pi
    latent edge
    semantics
    k
    Transformation
    matrix of node v

    View Slide

  46. Taxonomic
    Relatedness
    Mv
    =
    k

    i=1
    wv
    [i] × Pi
    Transformation
    matrix of node v
    latent edge
    semantics
    k
    Linear map for
    semantic type i

    View Slide

  47. Taxonomic
    Relatedness
    Mv
    =
    k

    i=1
    [i] × Pi
    latent edge
    semantics
    k Taxonomic “role”
    of parent v
    wv
    Linear map for
    semantic type i
    Transformation
    matrix of node v

    View Slide

  48. Taxonomic
    Relatedness
    = f(ev
    )
    Taxonomic “role”
    of parent v
    is any learnable function
    f
    wv

    View Slide

  49. Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    Mv
    =
    k

    i=1
    [i] × Pi
    wv

    View Slide

  50. Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    To learn:
    P1
    , …, Pk
    f : ℝd → ℝk
    Mv
    =
    k

    i=1
    [i] × Pi
    wv

    View Slide

  51. Taxonomic
    Relatedness
    Relatedness score s(u, v)
    s(u, v) = (eu
    Mv
    ) ⋅ ev
    parameters
    Information-sharing
    across nodes
    Robust to noise
    O(d2k + |f |)
    Mv
    =
    k

    i=1
    [i] × Pi
    wv

    View Slide

  52. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  53. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  54. Large-Margin Loss

    View Slide

  55. Large-Margin Loss
    Desired constraint
    s(child, parent) > s(child,nonparent) + γ

    View Slide

  56. Large-Margin Loss
    s(child, parent) _
    Violated constraint
    >
    s(child,nonparent) + γ

    View Slide

  57. Large-Margin Loss
    s(child, parent)

    Constraint violation
    [ ]+
    s(child,nonparent) + γ

    View Slide

  58. Large-Margin Loss
    Loss function
    u, v
    u, v′
    [ ]+

    s( ) s( )

    (u,v,v′ )
    + γ

    View Slide

  59. Large-Margin Loss
    How to pick the margin?
    γ

    View Slide

  60. Large-Margin Loss
    How to pick the margin?
    Option 1: Heuristic constant
    γ

    View Slide

  61. Large-Margin Loss
    How to pick the margin?
    Option 1: Heuristic constant
    Option 2: Tune on validation set
    γ

    View Slide

  62. Large-Margin Loss
    How to pick the margin?
    Option 1: Heuristic constant
    Option 2: Tune on validation set
    Option 3: Learn from data
    γ

    View Slide

  63. Large-Margin Loss
    How to pick the margin?
    Option 1: Heuristic constant
    Option 2: Tune on validation set
    Option 3: Learn from data
    Our approach: Dynamic margins
    γ

    View Slide

  64. Large-Margin Loss
    γ (u, v, v′ )

    View Slide

  65. Large-Margin Loss
    γ (u, v, v′ ) = shortest path distance (v, v′ )

    View Slide

  66. Large-Margin Loss
    γ (u, v, v′ ) = shortest path distance (v, v′ )
    If,
    Proposition

    View Slide

  67. Large-Margin Loss
    γ (u, v, v′ ) = shortest path distance (v, v′ )
    loss ≥ ∑
    (u,v)
    shortest path distance (v, ̂
    v(u))
    If,
    Proposition

    View Slide

  68. Large-Margin Loss
    γ (u, v, v′ ) = shortest path distance (v, v′ )
    loss ≥ ∑
    (u,v)
    shortest path distance (v, ̂
    v(u))
    If,
    Proposition
    True parent Predicted parent

    View Slide

  69. loss ≥ ∑
    (u,v)
    shortest path distance (v, ̂
    v(u))
    (u, v, v′ ) = shortest path distance (v, v′ )
    Large-Margin Loss
    γ
    If,
    Proposition
    q
    True Parent
    Easy fix
    Hard fix
    Easier human verification!

    View Slide

  70. Large-Margin Loss
    Infeasible to sample all non-parents v′

    View Slide

  71. Large-Margin Loss
    Negative sampling
    Infeasible to sample all non-parents v′

    View Slide

  72. Large-Margin Loss
    Negative sampling
    Loss rapidly drops to 0
    — no “active” samples
    Infeasible to sample all non-parents v′

    View Slide

  73. Large-Margin Loss
    Negative sampling
    Loss rapidly drops to 0
    — no “active” samples
    Distance-weighted
    sampling [Wu et. al., 2017]
    Infeasible to sample all non-parents v′
    egrades to similar performance as CRIM on the SE
    with homogeneous edge semantics.
    eports the top-ranked predicted parents by A
    for both accurately and inaccurately-predicted test
    e parents are emphasized in bold). The results showcase
    on a variety of node-types present in the P
    from concrete entities such as locations (Luxor) and
    aracters (Thor) to abstract concepts such as depression.
    that even inaccurately-predicted parents conform to
    n of relatedness and immediate hierarchy, suggesting
    missing edges in the taxonomy.
    howcase A’s predictions for search queries made
    t that are not present in the taxonomy (Table 5, bottom).
    y, A is able to accurately associate unseen natu-
    e queries to potentially related nodes in the P
    Of note is the search query what causes blackheads,
    t just associated with its obvious parent skin concern,
    he very relevant parent feelings.
    ation Study
    mance of A may be attributed to two key model-
    : (i) learning node-specic embeddings w to capture
    ous edge semantics, and (ii) optimizing a large-margin
    (b) MRR for each value of constant margin vs. dynamic margins
    (c) Uniform vs. distance-weighted negative-sampling

    View Slide

  74. Implementation Details
    CODE / RESOURCES / SLIDES / VIDEO
    cmuarborist.github.io

    View Slide

  75. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  76. Outline
    1. Modeling Taxonomic Relatedness
    2. Learning, Prediction & Dynamic Margins
    3. Evaluation

    View Slide

  77. Datasets
    3 Textual Taxonomies Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  78. Datasets
    Pinterest Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  79. Datasets
    Pinterest
    Heterogenous semantics
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  80. Datasets
    Pinterest
    Heterogenous semantics
    Nodes can be concrete
    (New York) or abstract
    (Mental Wellbeing)
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  81. Datasets
    Pinterest
    Heterogenous semantics
    Nodes can be concrete
    (New York) or abstract
    (Mental Wellbeing)
    PinText embeddings
    used for each node
    [Zhuang and Liu, 2019]
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  82. Datasets
    SemEval Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  83. Datasets
    SemEval
    From the SemEval 2018
    hypernym discovery task
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  84. Datasets
    SemEval
    From the SemEval 2018
    hypernym discovery task
    Homogenous “is-a”
    semantics
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  85. Datasets
    SemEval
    From the SemEval 2018
    hypernym discovery task
    Homogenous “is-a”
    semantics
    FastText embeddings
    used for each node
    [Bojanowski et. al., 2017]
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  86. Datasets
    Mammal Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  87. Datasets
    Mammal
    WordNet noun subgraph
    rooted at mammal.n.01
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  88. Datasets
    Mammal
    WordNet noun subgraph
    rooted at mammal.n.01
    3 edge types: is-a, is-
    part-of-whole, is-part-of-
    substance
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  89. Datasets
    Mammal
    WordNet noun subgraph
    rooted at mammal.n.01
    3 edge types: is-a, is-
    part-of-whole, is-part-of-
    substance
    FastText embeddings
    used for each node
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  90. Datasets
    Evaluation Setup
    15% of leaf nodes +
    outgoing edges held
    out for testing
    Remaining child-parent
    pairs used for training
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  91. Datasets
    Metrics
    • Mean Reciprocal
    Rank (MRR): From 0%
    to 100% (best)
    [email protected]: From 0%
    to 100% (best)
    • Mean shortest-path
    distance (SPDist)
    Pinterest SemEval Mammal
    No. of edges 10768 18827 5765
    No. of nodes 10792 8154 5080
    Training Nodes 7919 7374 4543
    Test nodes 2873 780 537
    Depth 7 ∞ 18
    Heterogenous
    Semantics
    ✓ ✗ ✓

    View Slide

  92. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  93. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  94. Hypernym Detectors
    Q. Can hypernym detectors
    be repurposed for taxonomy
    expansion?
    [Baroni et. al., 2012; Roller et. al., 2014;
    Weeds et. al., 2014; Shwartz et. al., 2016]

    View Slide

  95. Hypernym Detectors
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%
    Classification F1 scores

    View Slide

  96. Hypernym Detectors
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%
    Classification F1 scores
    Vector operation +
    random forest classifier

    View Slide

  97. Hypernym Detectors
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%
    Classification F1 scores
    Vector operation +
    random forest classifier
    Trained and tested on
    a balanced sample of
    node-pairs

    View Slide

  98. Hypernym Detectors
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%

    View Slide

  99. Hypernym Detectors
    1. Reasonably good
    performance overall
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%

    View Slide

  100. Hypernym Detectors
    1. Reasonably good
    performance overall
    2. Better embeddings
    correlated with better
    performance
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%

    View Slide

  101. Hypernym Detectors
    1. Reasonably good
    performance overall
    2. Better embeddings
    correlated with better
    performance
    3. No single dominant
    hypernym detector
    F1 Pinterest SemEval Mammal
    CONCAT 86.5% 59.3% 72.1%
    SUM 87.7% 60.6% 77.2%
    DIFF 87.0% 63.4% 75.7%
    PROD 86.0% 65.7% 78.0%

    View Slide

  102. Hypernym Detectors
    MRR Pinterest SemEval Mammal
    CONCAT 41.8% 21.0% 15.0%
    SUM 33.9% 17.8% 19.6%
    DIFF 41.2% 18.5% 31.4%
    PROD 42.2% 17.5% 32.2%
    Mean Reciprocal Ranks

    View Slide

  103. Hypernym Detectors
    MRR Pinterest SemEval Mammal
    CONCAT 41.8% 21.0% 15.0%
    SUM 33.9% 17.8% 19.6%
    DIFF 41.2% 18.5% 31.4%
    PROD 42.2% 17.5% 32.2%
    Mean Reciprocal Ranks
    Uncorrelated with
    classification performance

    View Slide

  104. Hypernym Detectors
    MRR Pinterest SemEval Mammal
    CONCAT 41.8% 21.0% 15.0%
    SUM 33.9% 17.8% 19.6%
    DIFF 41.2% 18.5% 31.4%
    PROD 42.2% 17.5% 32.2%
    Mean Reciprocal Ranks
    Uncorrelated with
    classification performance
    Explicit formulation of
    taxonomy expansion as
    ranking needed

    View Slide

  105. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  106. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  107. Taxonomy Expansion
    MRR Pinterest SemEval Mammal
    CRIM 53.2% 41.7% 21.3%
    This
    Work
    59.0% 43.4% 29.4%
    Q. Does explicitly
    accommodating
    heterogenous edge
    semantics help?

    View Slide

  108. Taxonomy Expansion
    MRR Pinterest SemEval Mammal
    CRIM 53.2% 41.7% 21.3%
    This
    Work
    59.0% 43.4% 29.4%
    Comparison with CRIM
    [Bernier-Colborne & Barriere, 2018]
    Models homogenous edge
    semantics
    Skip-gram-negative-sampling-
    like loss function

    View Slide

  109. Taxonomy Expansion
    MRR Pinterest SemEval Mammal
    CRIM 53.2% 41.7% 21.3%
    This
    Work
    59.0% 43.4% 29.4%
    Comparison with CRIM
    [Bernier-Colborne & Barriere, 2018]
    Our method has better
    ranking performance when
    taxonomy has heterogenous
    edge semantics

    View Slide

  110. Taxonomy Expansion
    SPDist Pinterest SemEval Mammal
    CRIM 2.4 2.7 4.1
    This
    Work
    2.2 2.9 3.2
    Comparison with CRIM
    [Bernier-Colborne & Barriere, 2018]
    Our method predicts parents
    that are closer to the true
    parents for taxonomies with
    heterogenous edge semantics

    View Slide

  111. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  112. Evaluation
    I. Repurposing hypernym detectors
    II. Taxonomy expansion performance
    III. Example Predictions on Pinterest

    View Slide

  113. Example Predictions
    Example results on Pinterest, correct parents in bold
    Query Predicted Parents
    luxor africa travel, european travel, asia travel, greece
    2nd month baby baby stage, baby, baby names, preparing for baby
    depression mental illness, stress, mental wellbeing, disease
    ramadan hosting occasions, holiday, sukkot, middle east & african cuisine
    minion humor humor, people humor, character humor, funny

    View Slide

  114. Example Predictions
    Concrete concept, “is-in” semantics
    Query Predicted Parents
    luxor africa travel, european travel, asia travel, greece
    2nd month baby baby stage, baby, baby names, preparing for baby
    depression mental illness, stress, mental wellbeing, disease
    ramadan hosting occasions, holiday, sukkot, middle east & african cuisine
    minion humor humor, people humor, character humor, funny

    View Slide

  115. Example Predictions
    Abstract concept, “is-type-of” semantics
    Query Predicted Parents
    luxor africa travel, european travel, asia travel, greece
    2nd month baby baby stage, baby, baby names, preparing for baby
    depression mental illness, stress, mental wellbeing, disease
    ramadan hosting occasions, holiday, sukkot, middle east & african cuisine
    minion humor humor, people humor, character humor, funny

    View Slide

  116. Example Predictions
    Example results on Pinterest, correct parents in bold
    Query Predicted Parents
    luxor africa travel, european travel, asia travel, greece
    2nd month baby baby stage, baby, baby names, preparing for baby
    depression mental illness, stress, mental wellbeing, disease
    ramadan hosting occasions, holiday, sukkot, middle east & african cuisine
    minion humor humor, people humor, character humor, funny

    View Slide

  117. Example failures on Pinterest (no correct parent in top 4 predictions)
    Query Predicted Parents
    artificial flowers planting, dried flowers, DIY flowers, edible seeds
    thor adventure movie, action movie, science movie, adventure games
    smartwatch wearable devices, phone accessories, electronics, computer
    disney makeup halloween makeup, makeup, costume makeup, character makeup
    holocaust history, german history, american history, world war
    Example Failures

    View Slide

  118. Test for Data Leakage
    Predictions for Pinterest search queries not present in the taxonomy
    Query Predicted Parents
    what causes blackheads skin concern, mental illness, feelings, disease
    meatloaf cupcakes cupcakes, dessert, no bake meals, steak
    benefits of raw carrots food and drinks, vegetables, diet, healthy recipes
    kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby
    humorous texts poems, quotes, authors, religious studies

    View Slide

  119. Predictions for Pinterest search queries not present in the taxonomy
    Query Predicted Parents
    what causes blackheads skin concern, mental illness, feelings, disease
    meatloaf cupcakes cupcakes, dessert, no bake meals, steak
    benefits of raw carrots food and drinks, vegetables, diet, healthy recipes
    kids alarm clock toddlers and preschoolers, child care, baby sleep issues, baby
    humorous texts poems, quotes, authors, religious studies
    Test for Data Leakage

    View Slide

  120. More in Paper
    Expanding Taxonomies with Implicit Edge Semantics WWW ’20, April 20–24, 2020, Taipei, Taiwan
    methods for 150 epochs (for P and SE) or 500 epochs
    (for M) and select the trained model at the epoch with the
    highest validation MRR (see appendix for details).
    Taxonomy expansion results are reported in Table 4. Overall,
    A and CRIM improve over the hypernym detectors on all
    datasets and evaluation metrics, by over 200% in some cases. This
    justies explicitly optimizing for the taxonomy expansion ranking
    task, and representing taxonomic relationships with more complex
    functions of the node-pair feature-vectors. A outperforms
    CRIM on all datasets and evaluation metrics. Notably, A
    gracefully degrades to similar performance as CRIM on the SE
    taxonomy with homogeneous edge semantics.
    Table 5 reports the top-ranked predicted parents by A
    on P for both accurately and inaccurately-predicted test
    queries (true parents are emphasized in bold). The results showcase
    predictions on a variety of node-types present in the P
    taxonomy, from concrete entities such as locations (Luxor) and
    ctional characters (Thor) to abstract concepts such as depression.
    We observe that even inaccurately-predicted parents conform to
    some notion of relatedness and immediate hierarchy, suggesting
    potentially missing edges in the taxonomy.
    We also showcase A’s predictions for search queries made
    on Pinterest that are not present in the taxonomy (Table 5, bottom).
    Qualitatively, A is able to accurately associate unseen natu-
    ral language queries to potentially related nodes in the P
    taxonomy. Of note is the search query what causes blackheads,
    which is not just associated with its obvious parent skin concern,
    but also to the very relevant parent feelings.
    5.4 Ablation Study
    The performance of A may be attributed to two key model-
    ing choices: (i) learning node-specic embeddings w to capture
    heterogeneous edge semantics, and (ii) optimizing a large-margin
    (a) Summary of ablation study
    (b) MRR for each value of constant margin vs. dynamic margins
    (c) Uniform vs. distance-weighted negative-sampling
    Figure 2: Ablation study of A on P: (a) sum-
    WWW ’20, April 20–24, 2020, Taipei, Taiwan Emaad Manzoor, Rui Li, Dhananjay Shrouty, and Jure Leskovec
    Figure 3: Eect of the number of linear maps k (top-left),
    the number of negative samples m (top-right) and the train-
    ing data fraction (bottom-left) on the MRR of A on
    P. Also shown (bottom-right) is the average undi-
    rected shortest-path distance between predicted and true
    test parents (SPDist) with training epoch.
    Ablation study
    Impact of
    hyperparameters
    Inferring
    taxonomic roles

    View Slide

  121. Summary

    View Slide

  122. Summary
    Expand taxonomies with heterogenous, unobserved
    edge semantics for human-in-the-loop verification

    View Slide

  123. Summary
    Expand taxonomies with heterogenous, unobserved
    edge semantics for human-in-the-loop verification
    Taxonomic Roles
    with Linear Maps
    ge Semantics
    Jure Leskovec
    interest, Stanford University
    @{pinterest.com,cs.stanford.edu}
    %$//
    1%$
    6+$4
    4XHU\T
    H
    1%$
    H
    T
    0
    1%$
    ,67<3(2)
    ,63/$<(52)
    ,6/($*8(2)
    ,03/,&,7('*(6(0$17,&6
    [
    VT1%$
    3


    3

    3
    .
    [
    Z
    1%$

    View Slide

  124. Summary
    Expand taxonomies with heterogenous, unobserved
    edge semantics for human-in-the-loop verification
    Taxonomic Roles
    with Linear Maps
    Large-Margin Loss with
    Dynamic Margins
    ge Semantics
    Jure Leskovec
    interest, Stanford University
    @{pinterest.com,cs.stanford.edu}
    %$//
    1%$
    6+$4
    4XHU\T
    H
    1%$
    H
    T
    0
    1%$
    ,67<3(2)
    ,63/$<(52)
    ,6/($*8(2)
    ,03/,&,7('*(6(0$17,&6
    [
    VT1%$
    3


    3

    3
    .
    [
    Z
    1%$
    and (u, , ) is the desired margin dened as a function of the
    child, parent and non-parent nodes.
    We now derive the loss function to be minimized in order to
    satisfy the large-margin constraint (5). Denote by E(u, , 0) the
    degree to which a non-parent node 0 violates the large-margin
    constraint of child-parent pair (u, ):
    E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6)
    When the large-margin constraint is satised, E(u, , 0) = 0 and
    the non-parent incurs no violation. Otherwise, E(u, , 0) > 0.
    The overall loss function L(T) is the total violation of the large-
    margin constraints by the non-parents corresponding to every
    child-parent pair (u, ):
    L(T) =

    (u, )2E

    0 2V H(u)
    E(u, , 0) (7)
    The node embeddings w and linear-maps P1, . . . , Pk are jointly
    trained to minimize L(T) via gradient-descent. Given the trained
    parameters and a query node q < V having feature-vector eq, pre-
    dictions are made by ranking the taxonomy nodes in decreasing
    order of their taxonomic relatedness s(q, ).
    Using the fact that
    pairs and their cor
    L(T)
    Thus, minimizin
    on the sum of shor
    predictions and tr
    dicted parent node
    truth taxonomy. In
    ages non-parent n
    be scored relatively
    This guarantee
    experts; if A
    node, the taxonom
    around the predic
    nd the correct pa

    View Slide

  125. Summary
    Expand taxonomies with heterogenous, unobserved
    edge semantics for human-in-the-loop verification
    Taxonomic Roles
    with Linear Maps
    Large-Margin Loss with
    Dynamic Margins
    Guarantees to Ease
    Human Verification
    ge Semantics
    Jure Leskovec
    interest, Stanford University
    @{pinterest.com,cs.stanford.edu}
    %$//
    1%$
    6+$4
    4XHU\T
    H
    1%$
    H
    T
    0
    1%$
    ,67<3(2)
    ,63/$<(52)
    ,6/($*8(2)
    ,03/,&,7('*(6(0$17,&6
    [
    VT1%$
    3


    3

    3
    .
    [
    Z
    1%$
    and (u, , ) is the desired margin dened as a function of the
    child, parent and non-parent nodes.
    We now derive the loss function to be minimized in order to
    satisfy the large-margin constraint (5). Denote by E(u, , 0) the
    degree to which a non-parent node 0 violates the large-margin
    constraint of child-parent pair (u, ):
    E(u, , 0) = max[0,s(u, 0) s(u, ) + (u, , 0)]. (6)
    When the large-margin constraint is satised, E(u, , 0) = 0 and
    the non-parent incurs no violation. Otherwise, E(u, , 0) > 0.
    The overall loss function L(T) is the total violation of the large-
    margin constraints by the non-parents corresponding to every
    child-parent pair (u, ):
    L(T) =

    (u, )2E

    0 2V H(u)
    E(u, , 0) (7)
    The node embeddings w and linear-maps P1, . . . , Pk are jointly
    trained to minimize L(T) via gradient-descent. Given the trained
    parameters and a query node q < V having feature-vector eq, pre-
    dictions are made by ranking the taxonomy nodes in decreasing
    order of their taxonomic relatedness s(q, ).
    Using the fact that
    pairs and their cor
    L(T)
    Thus, minimizin
    on the sum of shor
    predictions and tr
    dicted parent node
    truth taxonomy. In
    ages non-parent n
    be scored relatively
    This guarantee
    experts; if A
    node, the taxonom
    around the predic
    nd the correct pa
    learned from the data [19].
    We propose a principled dynamic margin func
    no tuning, learning or heuristics. We relate th
    shortest-path distances in the taxonomy between
    true parent nodes. Denote by d(·, ·) the undirec
    distance between two nodes in the taxonomy. W
    theorem, we bound the undirected shortest-path
    the highest-ranked predicted parent ˆ(u) = arg
    any true parent for every child node u:
    P 1. When (u, , 0) = d( , 0), L
    bound on the sum of the undirected shortest-path
    the highest-ranked predicted parents and true par

    (u, )2E
    d( , ˆ(u))  L(T).

    View Slide

  126. 126
    CODE / RESOURCES / SLIDES / VIDEO
    cmuarborist.github.io
    [email protected]

    View Slide