Slide 1

Slide 1 text

On Type-Aware Entity Retrieval Dar´ ıo Garigliotti and Krisztian Balog University of Stavanger 3rd ACM International Conference on the Theory of Information Retrieval Amsterdam, The Netherlands - October 2, 2017

Slide 2

Slide 2 text

We thank SIGIR for the Students Travel Grant

Slide 3

Slide 3 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Outline: 1 Type-Aware Entity Retrieval 2 Dimensions of Type Information 3 Results and Analysis 4 Conclusions and Future Work Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 4

Slide 4 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Entity Types A characteristic property of entities is that they are typed Types are organized in hierarchies (or taxonomies) … Scientist … … … Person Agent … Enrico Fermi Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 5

Slide 5 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Query Target Types Target types: types of entities sought by the query … Scientist Artist Writer … … … Person Agent … italian nobel prize winners Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 6

Slide 6 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types Target Types Target types occur in many queries countries where one can pay with the euro art museums in Amsterdam italian nobel prize winners Types help to reduce the space of search Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 7

Slide 7 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Entity Types Target Types E.g. Buying a book Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 8

Slide 8 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimensions of Type Information Type information have been shown to improve Entity Retrieval INEX Entity Ranking track We systematically identify and compare all combinations of three dimensions of type information Type taxonomies Type representations Retrieval models Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 9

Slide 9 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Taxonomies Which type taxonomy to use? DBpedia Ontology (7 levels, 600 types) Freebase Types (2 levels, 2K types) Wikipedia Categories (34 levels, 600K types) YAGO Taxonomy (19 levels, 500K types) These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 10

Slide 10 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 11

Slide 11 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 12

Slide 12 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Most specific type(s) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 13

Slide 13 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models How to add type information into entity retrieval? Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 14

Slide 14 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q) Types(q) Types(e) Types(e) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 15

Slide 15 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 16

Slide 16 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Type Taxonomies Type Representations Retrieval Models Dimension: Retrieval Models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 17

Slide 17 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup Test collection of DBpedia entities 1 Baseline: Mixture of Language Models (title and content) Idealized assumption of a target types oracle Settings for type assignments 1Krisztian Balog and Robert Neumayer. 2013. A Test Collection for Entity Search in DBpedia. In Proc. of SIGIR. 737–740. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 18

Slide 18 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup: Target Types Oracle An oracle provides us with the (distribution of) correct target types for a given query Construction: given a query, take union of all types of all its relevant entities Probability proportional to the number of relevant entities having the type Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 19

Slide 19 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Experimental Setup: Type Assignments Two settings to deal with missing type assignments 4TT: Only entities with types in all types taxonomies E.g. types for the entity Enrico Fermi In DBpedia: Scientist In Freebase: award.award winner, people.deceased person, education.academic, ... In Wikipedia: Nobel laureates in Physics, University of Pisa alumni, ... In YAGO: ItalianPhysicists, NobelLaureatesInPhysics, AmericanPeopleOfItalianDescent, ... ALL: All available entities are allowed Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 20

Slide 20 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Research Questions RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? RQ2 How to represent hierarchical entity type information for entity retrieval? RQ3 How to combine term-based and type-based information? Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 21

Slide 21 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 22

Slide 22 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ1 What is the impact of the particular choice of type taxonomy on entity retrieval performance? Wikipedia, in combination with the most specific type representation, performs best (for both 4TT and ALL) Highly significant improvements for all models in 4TT Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 23

Slide 23 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ2 How to represent hierarchical entity type information for entity retrieval? Using the most specific types in the hierarchy provides the best performance No evidence that hierarchical relationships from ancestor types would benefit retrieval effectiveness Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 24

Slide 24 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results RQ3 How to combine term-based and type-based information? In the 4TT setting, strict filtering is the best retrieval model Only the interpolation model can deal in a robust manner with the loss of type assignments in the ALL setting Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 25

Slide 25 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Results Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 26

Slide 26 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis Summary of Findings Using the most specific types is the most effective way to represent hierarchical entity type information Wikipedia performs best across all type taxonomies in most of the cases All models to combine term- and type-based information suffer from missing type information, but interpolation appears to be the most robust Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 27

Slide 27 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 28

Slide 28 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 29

Slide 29 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis An Instance of Query-level Analysis Query: italian nobel prize winners Baseline. MAP: 0.1607 Target types: DBpedia, most specific, soft filter. MAP: 0.1829 Artist, Scientist, Writer. Wikipedia, most specific, inter (0.95). MAP: 0.8518 Italian Nobel laureates, Nobel laureates in Literature, ... Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 30

Slide 30 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Experimental Setup Research Questions Results and Analysis What is in a Target Type? What portion of relevant entities can target types capture? Top-K types Type Taxonomy P R F1 K = 1 DBpedia 0.0027 0.5863 0.0046 Freebase 0.0060 0.7254 0.0076 Wikipedia 0.1147 0.4798 0.1287 YAGO 0.0418 0.6303 0.0488 K = 3 DBpedia 0.0006 0.7199 0.0012 Freebase 0.0004 0.7805 0.0008 Wikipedia 0.0402 0.5847 0.0614 YAGO 0.0036 0.7025 0.0062 Fine-grained types in Wikipedia category graph can capture some subset of relevant entities with the highest P and F1 Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 31

Slide 31 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Conclusions and Future Work In this work: We identify and systematically compare distinguished dimensions in type-aware entity retrieval We observe that type information proves most useful when larger, deeper type taxonomies provide very specific types. In future work: We plan to report further query-level analyses We wish to re-assess the experiments using automatically identified target types2 2Dar´ ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proc. of SIGIR. 845–848. Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 32

Slide 32 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 33

Slide 33 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 34

Slide 34 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendices Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 35

Slide 35 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Retrieval Model Interpolation model For DBpedia and Freebase, more type-based information is always increasingly more harmful Wikipedia and YAGO performances increase with higher contribution of type information using most specific types. 0 0.5 1 0 0.1 0.2 0.3 0.4 λt MAP DBpedia Freebase Wikipedia YAGO (a) Along path Figure 1: Interpolation performances for different type weights λt (4TT). 0 0.5 1 0 0.1 0.2 0.3 0.4 λt MAP (a) Path-to-top types 0 0.5 1 λt (b) Top-level types 0 0.5 1 λt (c) Most specific types Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 36

Slide 36 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Revisited Target Types Oracle The target types distribution of the default oracle includes all types associated with known relevant entities Alternatively, we assess the configurations using a filtered oracle of target types that satisfy a threshold of coverage of relevant entities Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval

Slide 37

Slide 37 text

Type-Aware Entity Retrieval Dimensions of Type Information Results and Analysis Appendix: Revisited Target Types Oracle Target Types Oracles: Default Filtered Models: Strict filtering Soft filtering Interpolation 0 0.1 0.2 0.3 Configurations MAP 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (a) Path-to-top types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (b) Top-level types 0 0.1 0.2 0.3 DBpedia Freebase Wikipedia YAGO MAP (c) Most specific types Filtered oracle leads to considerable drops in performance of settings using the most specific types It is important to consider all possible target types Dar´ ıo Garigliotti and Krisztian Balog On Type-Aware Entity Retrieval