Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type-Aware Entity Retrieval

Type-Aware Entity Retrieval

Date: June 14, 2016
Venue: Oslo, Norway. Doctoral Seminar at HiOA

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #EntityRanking #EntityRetrieval #ER #EntityTypes #EntityOrientedSearch #KnowledgeBases #SemanticSearch

Darío Garigliotti

June 14, 2016
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type-aware

    Entity Retrieval Dar´ ıo Garigliotti University of Stavanger June 14, 2016 Dar´ ıo Garigliotti Type-aware Entity Retrieval
  2. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Outline:

    1 Types and Entity Retrieval 2 Environment Dimensions Type taxonomies Type representations Retrieval models 3 Type-aware Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  3. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Traditional Information Retrieval recently extended to an Entity-oriented Search It revolves around the satisfaction of more complex information needs Several entity elements from knowledge bases, naturally appearing in queries Countries where one can pay with the euro Related entities (via a relation or predicate) Types or categories or classes Dar´ ıo Garigliotti Type-aware Entity Retrieval
  4. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  5. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  6. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Why to think about types? Entities are typed Types are useful for retrieval, presentation, summarization... Related tasks, e.g. Entity ranking (given a query and target categories) List completion (given a query and entity examples, and? types) Query target type identification Our focus is on emergent dimensions to explore Dar´ ıo Garigliotti Type-aware Entity Retrieval
  7. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type taxonomies There are different type taxonomies from various knowledge bases DBpedia Ontology Freebase Types Wikipedia Categories YAGO Taxonomy These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Normalisation efforts are needed Dar´ ıo Garigliotti Type-aware Entity Retrieval
  8. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models DBpedia Ontology A well-designed hierarchy Created manually by considering the most frequently used infoboxes in Wikipedia Clean and consistent, but with limited coverage 0 1 2 3 4 5 6 7 |Level 1| = 58 types |Level 2| = 114 types |Level 3| = 142 types |Level 4| = 213 types |Level 5| = 45 types |Level 6| = 17 types |Level 7| = 1 type Dar´ ıo Garigliotti Type-aware Entity Retrieval
  9. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models DBpedia Ontology Dar´ ıo Garigliotti Type-aware Entity Retrieval
  10. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Freebase Types A two-layer categorization system: types and domains Entities are only assigned to types, having most of them “same as” links to DBpedia entities 0 1 2 |Level 1| = 92 types |Level 2| = 1, 626 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  11. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Wikipedia Categories It consists of textual labels known as categories It’s not a well-defined “is-a” hierarchy, but a graph Category assignments are neither consistent nor complete It requires a major normalisation strategy 0 1 2-10 11-24 25- 34 |Level 1| = 27 types |Level 2 ∪ ... ∪ Level 10| = 121, 657 types |Level 11 ∪ ... ∪ Level 24| = 410, 697 types |Level 25 ∪ ... ∪ Level 34| = 14, 564 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  12. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models YAGO Taxonomy A deep subsumption hierarchy Its classification schema is constructed by taking leaf categories from Wikipedia categories and then using WordNet synsets to establish the hierarchy 0 1 2-5 6-10 11- 19 |Level 1| = 61 types |Level 2 ∪ ... ∪ Level 5| = 80, 384 types |Level 6 ∪ ... ∪ Level 10| = 461, 843 types |Level 11 ∪ ... ∪ Level 19| = 26, 383 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  13. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? Dar´ ıo Garigliotti Type-aware Entity Retrieval
  14. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top Dar´ ıo Garigliotti Type-aware Entity Retrieval
  15. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  16. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Most specific type(s) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  17. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ ıo Garigliotti Type-aware Entity Retrieval
  18. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q) Types(q) Types(e) Types(e) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  19. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  20. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  21. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft What did we do? We systematically identified and compared all combinations of those dimensions 4 type taxonomies: DBpedia Ontology (3.9), Freebase Types (2015-03-31), Wikipedia Categories (for DBpedia 3.9) and YAGO Taxonomy (3.0.2) 3 type representations: path-to-top, top-level, most specific 3 models: strict and soft filtering, interpolation Environment: from idealized to realistic query types oracle entities fully typed in all the taxonomies Dar´ ıo Garigliotti Type-aware Entity Retrieval
  22. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft What did we do? Results Dar´ ıo Garigliotti Type-aware Entity Retrieval
  23. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Lessons learned Summary of insights: How to represent hierarchical entity type information? (RQ1) Using the most specific types appears to be the best way What (kind of) type taxonomies to use? (RQ2) Wikipedia, in combination with most specific types, performs the best in both the idealized and the more realistic scenarios What combination model to choose? (RQ3) The interpolation model appears to be more robust Dar´ ıo Garigliotti Type-aware Entity Retrieval
  24. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Further analysis: strict filtering vs interpolation models Strict filtering treats target types as a set Interpolation operates with a probability distribution over types When we drop from oracle every type assigned to less than 3 entities, interpolation adapts quite better DBpedia Freebase Wikipedia YAGO Most-specific types DBpedia Freebase Wikipedia YAGO Most-specific types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  25. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Further analysis: query-level ranking details E.g. performance for (Interpolation, Most specific level, Wikipedia-3.9) query = “Which books by Kerouac were published by Viking Press?” Types: 90 (including Viking Press books) Types of the hurt relevant entities: all contain Viking Press books Dar´ ıo Garigliotti Type-aware Entity Retrieval
  26. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Further analysis: query-level ranking details E.g. performance for (Interpolation, Most specific level, Wikipedia-3.9) query = “Give me all actors starring in Batman Begins” All 7 relevant entities are improved Dar´ ıo Garigliotti Type-aware Entity Retrieval
  27. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Query target type detection Automatic query target type detection Baselines Entity-centric: first, to rank entities based on their relevance to the query, then look at what types the top-k ranked entities have Type-centric: to build a direct term-based representation for each type, by aggregating descriptions of entities of that type Learning-to-rank with several features Dar´ ıo Garigliotti Type-aware Entity Retrieval
  28. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Query target type detection Dar´ ıo Garigliotti Type-aware Entity Retrieval
  29. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned Query target type detection Future work draft Future work draft Automatic query target type detection must be further analysed. Experiments revisited with additional features and expanded set of candidate types. Query classification, for deciding about query suitability to be improved its retrieval by type-aware approach Its performance by itself, and its impact in the full system Dar´ ıo Garigliotti Type-aware Entity Retrieval