Upgrade to Pro — share decks privately, control downloads, hide ads and more …

On Type-Aware Entity Retrieval

On Type-Aware Entity Retrieval

Date: October 2nd, 2017
Venue: Amsterdam, The Netherlands. The 2017 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '17)
Corresponding article: https://arxiv.org/abs/1708.08291

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #EntityRanking #EntityRetrieval #ER #EntityTypes #EntityOrientedSearch #KnowledgeBases #SemanticSearch

Darío Garigliotti

October 02, 2017
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. On Type-Aware Entity Retrieval Dar´ ıo Garigliotti and Krisztian Balog

    University of Stavanger 3rd ACM International Conference on the Theory of Information Retrieval Amsterdam, The Netherlands - October 2, 2017
  2. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Outline: 1 Type-Aware Entity Retrieval 2 Dimensions of Type Information 3 Results and Analysis 4 Conclusions and Future Work Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  3. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Entity Types Target Types Entity Types A characteristic property of entities is that they are typed Types are organized in hierarchies (or taxonomies) … Scientist … … … Person Agent … Enrico Fermi Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  4. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Entity Types Target Types Query Target Types Target types: types of entities sought by the query … Scientist Artist Writer … … … Person Agent … italian nobel prize winners Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  5. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Entity Types Target Types Target Types Target types occur in many queries countries where one can pay with the euro art museums in Amsterdam italian nobel prize winners Types help to reduce the space of search Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  6. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Entity Types Target Types E.g. Buying a book Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  7. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimensions of Type Information Type information have been shown to improve Entity Retrieval INEX Entity Ranking track We systematically identify and compare all combinations of three dimensions of type information Type taxonomies Type representations Retrieval models Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  8. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Type Taxonomies Which type taxonomy to use? DBpedia Ontology (7 levels, 600 types) Freebase Types (2 levels, 2K types) Wikipedia Categories (34 levels, 600K types) YAGO Taxonomy (19 levels, 500K types) These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  9. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  10. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  11. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Most specific type(s) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  12. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models How to add type information into entity retrieval? Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  13. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q) Types(q) Types(e) Types(e) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  14. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  15. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  16. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Experimental Setup Test collection of DBpedia entities 1 Baseline: Mixture of Language Models (title and content) Idealized assumption of a target types oracle Settings for type assignments 1Krisztian Balog and Robert Neumayer. 2013. A Test Collection for Entity Search in DBpedia. In Proc. of SIGIR. 737–740. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  17. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Experimental Setup: Target Types Oracle An oracle provides us with the (distribution of) correct target types for a given query Construction: given a query, take union of all types of all its relevant entities Probability proportional to the number of relevant entities having the type Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  18. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Experimental Setup: Type Assignments Two settings to deal with missing type assignments 4TT: Only entities with types in all types taxonomies E.g. types for the entity Enrico Fermi In DBpedia: Scientist In Freebase: award.award winner, people.deceased person, education.academic, ... In Wikipedia: Nobel laureates in Physics, University of Pisa alumni, ... In YAGO: ItalianPhysicists, NobelLaureatesInPhysics, AmericanPeopleOfItalianDescent, ... ALL: All available entities are allowed Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  19. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Research Questions RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? RQ2 How to represent hierarchical entity type information for entity retrieval? RQ3 How to combine term-based and type-based information? Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  20. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Results Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  21. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Results RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  22. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Results RQ2 How to represent hierarchical entity type information for entity retrieval? Using the most specific types in the hierarchy provides the best performance No evidence that hierarchical relationships from ancestor types would benefit retrieval effectiveness Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  23. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Results RQ3 How to combine term-based and type-based information? In the 4TT setting, strict filtering is the best retrieval model Only the interpolation model can deal in a robust manner with the loss of type assignments in the ALL setting Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  24. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Results Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  25. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis Summary of Findings Using the most specific types is the most effective way to represent hierarchical entity type information Wikipedia performs best across all type taxonomies in most of the cases All models to combine term- and type-based information suffer from missing type information, but interpolation appears to be the most robust Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  26. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  27. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  28. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Wikipedia, most specific, inter (0.95). MAP: 0.8518 Italian Nobel laureates, Nobel laureates in Literature, ... Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  29. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Experimental Setup Research Questions Results and Analysis What is in a Target Type? What portion of relevant entities can target types capture? Top-K types Type Taxonomy P R F1 K = 1 DBpedia 0.0027 0.5863 0.0046 Freebase 0.0060 0.7254 0.0076 Wikipedia 0.1147 0.4798 0.1287 YAGO 0.0418 0.6303 0.0488 K = 3 DBpedia 0.0006 0.7199 0.0012 Freebase 0.0004 0.7805 0.0008 Wikipedia 0.0402 0.5847 0.0614 YAGO 0.0036 0.7025 0.0062 Fine-grained types in Wikipedia category graph can capture some subset of relevant entities with the highest P and F1 Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  30. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Conclusions and Future Work In this work: We identify and systematically compare distinguished dimensions in type-aware entity retrieval We observe that type information proves most useful when larger, deeper type taxonomies provide very specific types. In future work: We plan to report further query-level analyses We wish to re-assess the experiments using automatically identified target types2 2Dar´ ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proc. of SIGIR. 845–848. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  31. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  32. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  33. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Appendices Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  34. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Appendix: Retrieval Model Interpolation model For DBpedia and Freebase, more type-based information is always increasingly more harmful Wikipedia and YAGO performances increase with higher contribution of type information using most specific types. 0 0.5 1 0 0.1 0.2 0.3 0.4 λt MAP DBpedia Freebase Wikipedia YAGO (a) Along path Figure 1: Interpolation performances for different type weights λt (4TT). 0 0.5 1 0 0.1 0.2 0.3 0.4 λt MAP (a) Path-to-top types 0 0.5 1 λt (b) Top-level types 0 0.5 1 λt (c) Most specific types Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  35. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Appendix: Revisited Target Types Oracle The target types distribution of the default oracle includes all types associated with known relevant entities Alternatively, we assess the configurations using a filtered oracle of target types that satisfy a threshold of coverage of relevant entities Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval
  36. Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis

    Appendix: Revisited Target Types Oracle Target Types Oracles: Default Filtered Models: Strict filtering Soft filtering Interpolation 0 0.1 0.2 0.3 Configurations MAP 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (a) Path-to-top types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (b) Top-level types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (c) Most specific types Filtered oracle leads to considerable drops in performance of settings using the most specific types It is important to consider all possible target types Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval