Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Type-Aware Entity Retrieval

Type-Aware Entity Retrieval

Date: June 10, 2016
Venue: Stavanger, Norway. Doctoral Seminar at the IAI group for the research visit of Prof. Kalervo Järvelin

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #EntityRanking #EntityRetrieval #ER #EntityTypes #EntityOrientedSearch #KnowledgeBases #SemanticSearch

Darío Garigliotti

June 10, 2016
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type-aware

    Entity Retrieval Dar´ ıo Garigliotti University of Stavanger June 10, 2016 Dar´ ıo Garigliotti Type-aware Entity Retrieval
  2. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Outline:

    1 Types and Entity Retrieval 2 Environment Dimensions Type taxonomies Type representations Retrieval models 3 Type-aware Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  3. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Traditional Information Retrieval recently extended to an Entity-oriented Search It revolves around the satisfaction of more complex information needs Several entity elements from knowledge bases, naturally appearing in queries Countries where one can pay with the euro Related entities (via a relation or predicate) Types or categories or classes Dar´ ıo Garigliotti Type-aware Entity Retrieval
  4. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  5. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Dar´ ıo Garigliotti Type-aware Entity Retrieval
  6. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Types

    and Entity Retrieval Why to think about types? Entities are typed Types are useful for retrieval, presentation, summarization... Related tasks, e.g. Entity ranking (given a query and target categories) List completion (given a query and entity examples, and? types) Query target type identification Our focus is on emergent dimensions to explore Dar´ ıo Garigliotti Type-aware Entity Retrieval
  7. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type taxonomies There are different type taxonomies from various knowledge bases DBpedia Ontology Freebase Types Wikipedia Categories YAGO Taxonomy These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Normalisation efforts are needed Dar´ ıo Garigliotti Type-aware Entity Retrieval
  8. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models DBpedia Ontology A well-designed hierarchy Created manually by considering the most frequently used infoboxes in Wikipedia Clean and consistent, but with limited coverage 0 1 2 3 4 5 6 7 |Level 1| = 58 types |Level 2| = 114 types |Level 3| = 142 types |Level 4| = 213 types |Level 5| = 45 types |Level 6| = 17 types |Level 7| = 1 type Dar´ ıo Garigliotti Type-aware Entity Retrieval
  9. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models DBpedia Ontology Dar´ ıo Garigliotti Type-aware Entity Retrieval
  10. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Freebase Types A two-layer categorization system: types and domains Entities are only assigned to types, having most of them “same as” links to DBpedia entities 0 1 2 |Level 1| = 92 types |Level 2| = 1, 626 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  11. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Wikipedia Categories It consists of textual labels known as categories It’s not a well-defined “is-a” hierarchy, but a graph Category assignments are neither consistent nor complete It requires a major normalisation strategy 0 1 2-10 11-24 25- 34 |Level 1| = 27 types |Level 2 ∪ ... ∪ Level 10| = 121, 657 types |Level 11 ∪ ... ∪ Level 24| = 410, 697 types |Level 25 ∪ ... ∪ Level 34| = 14, 564 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  12. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models YAGO Taxonomy A deep subsumption hierarchy Its classification schema is constructed by taking leaf categories from Wikipedia categories and then using WordNet synsets to establish the hierarchy 0 1 2-5 6-10 11- 19 |Level 1| = 61 types |Level 2 ∪ ... ∪ Level 5| = 80, 384 types |Level 6 ∪ ... ∪ Level 10| = 461, 843 types |Level 11 ∪ ... ∪ Level 19| = 26, 383 types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  13. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? Dar´ ıo Garigliotti Type-aware Entity Retrieval
  14. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top Dar´ ıo Garigliotti Type-aware Entity Retrieval
  15. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  16. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Type representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Most specific type(s) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  17. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ ıo Garigliotti Type-aware Entity Retrieval
  18. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q) Types(q) Types(e) Types(e) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  19. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  20. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Type

    taxonomies Type representations Retrieval models Retrieval models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ ıo Garigliotti Type-aware Entity Retrieval
  21. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? What did we do? We systematically identified and compared all combinations of those dimensions 4 type taxonomies: DBpedia Ontology (3.9), Freebase Types (2015-03-31), Wikipedia Categories (for DBpedia 3.9) and YAGO Taxonomy (3.0.2) 3 type representations: path-to-top, top-level, most specific 3 models: strict and soft filtering, interpolation Environment: from idealized to realistic query types oracle entities fully typed in all the taxonomies Dar´ ıo Garigliotti Type-aware Entity Retrieval
  22. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? What did we do? Results Dar´ ıo Garigliotti Type-aware Entity Retrieval
  23. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? Lessons learned Summary of insights: How to represent hierarchical entity type information? (RQ1) Using the most specific types appears to be the best way What (kind of) type taxonomies to use? (RQ2) Wikipedia, in combination with most specific types, performs the best in both the idealized and the more realistic scenarios What combination model to choose? (RQ3) The interpolation model appears to be more robust Dar´ ıo Garigliotti Type-aware Entity Retrieval
  24. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? Further analysis: strict filtering vs interpolation models Strict filtering treats target types as a set Interpolation operates with a probability distribution over types When we drop from oracle every type assigned to less than 3 entities, interpolation adapts quite better DBpedia Freebase Wikipedia YAGO Most-specific types DBpedia Freebase Wikipedia YAGO Most-specific types Dar´ ıo Garigliotti Type-aware Entity Retrieval
  25. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? Further analysis: query-level ranking details E.g. performance for (Interpolation, Most specific level, Wikipedia-3.9) query = “Which books by Kerouac were published by Viking Press?” Types: 90 (including Viking Press books) Types of the hurt relevant entities: all contain Viking Press books Dar´ ıo Garigliotti Type-aware Entity Retrieval
  26. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? Further analysis: query-level ranking details E.g. performance for (Interpolation, Most specific level, Wikipedia-3.9) query = “Give me all actors starring in Batman Begins” All 7 relevant entities are improved Dar´ ıo Garigliotti Type-aware Entity Retrieval
  27. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? What are we doing? Automatic query target type detection Baselines Entity-centric: first, to rank entities based on their relevance to the query, then look at what types the top-k ranked entities have Type-centric: to build a direct term-based representation for each type, by aggregating descriptions of entities of that type Learning-to-rank with several features Dar´ ıo Garigliotti Type-aware Entity Retrieval
  28. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval What

    did we do? Lessons learned What are we doing? What are we doing? Target type detection Dar´ ıo Garigliotti Type-aware Entity Retrieval
  29. Types and Entity Retrieval Environment Dimensions Type-aware Entity Retrieval Thanks!

    Questions? Dar´ ıo Garigliotti Type-aware Entity Retrieval