Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Task-Based Information Retrieval

Task-Based Information Retrieval

Date: March 13, 2017
Venue: Stavanger, Norway. Doctoral Seminar at the IAI group for the research visit of Prof. Maarten de Rijke

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #TaskBasedSearch #TaskCompletionEngines

Darío Garigliotti

March 13, 2017
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based Information

    Retrieval Dar´ ıo Garigliotti University of Stavanger March 13, 2017 Dar´ ıo Garigliotti Task-based Information Retrieval
  2. Target Types Identification Type-aware Entity Retrieval Query Suggestions About me

    I am a Ph.D. candidate in Information Technology I started my Ph.D. on November 2015 I hold a M.Sc. in Computer Science from Fa.M.A.F. - National University of C´ ordoba, Argentina Dar´ ıo Garigliotti Task-based Information Retrieval
  3. Target Types Identification Type-aware Entity Retrieval Query Suggestions Outline: 1

    Target Types Identification 2 Type-aware Entity Retrieval 3 Query Suggestions Dar´ ıo Garigliotti Task-based Information Retrieval
  4. Target Types Identification Type-aware Entity Retrieval Query Suggestions Overview: from

    the library to the assistant Task: Underlying information need of an user E.g. wanting to plan a travel, issuing paris Dar´ ıo Garigliotti Task-based Information Retrieval
  5. Target Types Identification Type-aware Entity Retrieval Query Suggestions Overview: from

    the library to the assistant Task: Underlying information need of an user E.g. wanting to plan a travel, issuing paris Document Retrieval: ranked list of relevant documents Dar´ ıo Garigliotti Task-based Information Retrieval
  6. Target Types Identification Type-aware Entity Retrieval Query Suggestions Overview: from

    the library to the assistant Task: Underlying information need of an user E.g. wanting to plan a travel, issuing paris Document Retrieval: ranked list of relevant documents Entity-oriented Search: entity : Paris , properties, relations Dar´ ıo Garigliotti Task-based Information Retrieval
  7. Target Types Identification Type-aware Entity Retrieval Query Suggestions Overview: from

    the library to the assistant Task: Underlying information need of an user E.g. wanting to plan a travel, issuing paris Document Retrieval: ranked list of relevant documents Entity-oriented Search: entity : Paris , properties, relations Task-completion Search: booking/planning assistant Dar´ ıo Garigliotti Task-based Information Retrieval
  8. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Understanding:

    Target Types Identification Task-based Information Retrieval Query Understanding Target types Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  9. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Motivation Large proportion of entity-bearing queries Query target types automatically detected rather than provided - Target types help to reduce the space of search - Types are organized in hierarchies (or taxonomies, or ontologies) Dar´ ıo Garigliotti Task-based Information Retrieval
  10. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights E.g. Buying a book on Amazon Dar´ ıo Garigliotti Task-based Information Retrieval
  11. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Problem Definition Hierarchical Target Type Identification (HTTI) problem: To find the most specific single target type, general enough to cover all relevant entities Many queries discarded since they had no types Some queries don’t have a clear single type Our alternative definition relaxes on those issues Dar´ ıo Garigliotti Task-based Information Retrieval
  12. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Test collection A new test collection with around 500 queries, built with a crowdsourcing experiment Human annotators chose a most specific type, possibly NIL Query: ratt albums Candidate types: 1. Agent 1.1. Person 1.1.1. Artist 1.1.1.1. Musical artist 2. Work 2.1. Musical work 2.1.1. Album 2.1.2. Single - None of these types Correct type: 2.1.1. Album 1 2 3 4 Number of main types 0 50 100 150 200 250 300 Number of queries No NIL type Has NIL type Dar´ ıo Garigliotti Task-based Information Retrieval
  13. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Approaches Baselines - Entity-centric (EC): rank entities based on their relevance to the query, then look at the types of the top-k entities - Type-centric (TC): build a term-based representation for each type, by aggregating descriptions of assigned entities Our approach: a Learning-to-rank (LTR) method, with a variety of features Dar´ ıo Garigliotti Task-based Information Retrieval
  14. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Features for learning to rank target types # Feature Description Baseline features 1-5 ECBM25,K (t, q) Entity-centric type score with K ∈ {5, 10, 20, 50, 100} using BM25 6-10 ECLM,K (t, q) Entity-centric type score with K ∈ {5, 10, 20, 50, 100} using LM 11 TCBM25(t, q) Type-centric score using BM25 12 TCLM (t, q) Type-centric score using LM Knowledge base features 13 DEPTH(t) The hierarchical level of type t, normalized by the taxonomy depth 14 CHILDREN(t) Number of children of type t in the taxonomy 15 SIBLINGS(t) Number of siblings of type t in the taxonomy 16 ENTITIES(t) Number of entities mapped to type t Type label features 17 LENGTH(t) Length of (the label of) type t in words 18 IDFSUM(t) Sum of IDF for terms in (the label of) type t 19 IDFAVG(t) Avg of IDF for terms in (the label of) type t 20-21 JTERMSn(t, q) Query-type Jaccard similarity for sets of n-grams, for n ∈ {1, 2} 22 JNOUNS(t, q) Query-type Jaccard similarity using only nouns 23 SIMAGGR(t, q) Cosine sim. between the q and t word2vec vectors aggregated over all terms 24 SIMMAX(t, q) Max. cosine sim. of w2v vectors between each pair of query (q) and type (t) terms 25 SIMAVG(t, q) Avg. of cosine sim. of w2v vectors between each pair of query (q) and type (t) terms Dar´ ıo Garigliotti Task-based Information Retrieval
  15. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Results and Insights INEX_LD ListSearch QALD2 SemSearch_ES 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 NDCG@5 EC, LM TC, LM LTR Identification performances across query groups Dar´ ıo Garigliotti Task-based Information Retrieval
  16. Target Types Identification Type-aware Entity Retrieval Query Suggestions Motivation Problem

    Definition Approaches Results and Insights Target Types Identification: Results and Insights SIMMAX(t,q) SIMAGGR(t,q) SIMAVG(t,q) TC BM25(t,q) ENTITIES(t) EC BM25,100(t,q) EC BM25,50(t,q) SIBLINGS(t) EC BM25,20(t,q) CHILDREN(t) IDFSUM(t) IDFAVG(t) JNOUNS(t,q) EC BM25,10(t,q) JTERMS 1(t,q) EC BM25,5(t,q) DEPTH(t) EC LM,100(t,q) LENGTH(t) EC LM,50(t,q) TC LM(t,q) EC LM,20(t,q) EC LM,10(t,q) EC LM,5(t,q) JTERMS 2(t,q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Task-based Information Retrieval
  17. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Task-based IR Task-based Information Retrieval Query Understanding Target types Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  18. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Task-based IR: Types and Entity Retrieval Task-based Information Retrieval Query Understanding Entities Types Target types Type-aware Entity Retrieval Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  19. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Type-aware Entity Retrieval A characteristic property of entities is that they are typed Types naturally appear in many queries countries where one can pay with the euro art museums in Amsterdam Types have been shown to improve Entity Retrieval Dar´ ıo Garigliotti Task-based Information Retrieval
  20. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions of Type Information We systematically identified and compared all combinations of 3 dimensions Type taxonomies Type representations Retrieval models Dar´ ıo Garigliotti Task-based Information Retrieval
  21. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Type Taxonomies Which type taxonomy to use? DBpedia Ontology (7 levels, 600 types) Freebase Types (2 levels, 2K types) Wikipedia Categories (34 levels, 600K types) YAGO Taxonomy (19 levels, 500K types) These vary a lot in terms of hierarchical structure and in how entity-type assignments are recorded Dar´ ıo Garigliotti Task-based Information Retrieval
  22. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Type Representations How to represent the hierarchical information? t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Type(s) along path to top t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Top-level type(s) t3 t3 t2 t2 t5 t5 t4 t4 t9 t9 t8 t8 e t6 t6 t12 t12 t7 t7 … t10 t10 t11 t11 t0 t0 t1 t1 … Most specific type(s) Dar´ ıo Garigliotti Task-based Information Retrieval
  23. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Retrieval Models How to add type information into entity retrieval? Retrieval task defined in a generative probabilistic framework P(q | e) query entity Olympic games target types Rio de Janeiro term-based similarity type-based similarity … … entity types Both query and entity considered in the term space as well as in the type space Dar´ ıo Garigliotti Task-based Information Retrieval
  24. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Retrieval Models (Strict) Filtering model P(q | e) = P(θT q | θT e ) · χ[types(q) ∩ types(e) = ∅] Types(q) Types(q) Types(e) Types(e) Dar´ ıo Garigliotti Task-based Information Retrieval
  25. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Retrieval Models (Soft) Filtering model P(q | e) = P(θT q | θT e ) · P(θT q | θT e ) Dar´ ıo Garigliotti Task-based Information Retrieval
  26. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Dimensions: Retrieval Models Interpolation model P(q | e) = (1 − λ) · P(θT q | θT e ) + λ · P(θT q | θT e ) Dar´ ıo Garigliotti Task-based Information Retrieval
  27. Target Types Identification Type-aware Entity Retrieval Query Suggestions Type Taxonomies

    Type Representations Retrieval Models Type-aware Entity Retrieval: Lessons Learned Summary of insights: Type information proves most useful when larger, deeper type taxonomies provide very specific types. How to represent hierarchical entity type information? Using the most specific types is the most effective way What (kind of) type taxonomies to use? Wikipedia performs best in most of the cases What combination model to choose? All models suffer from missing type information, but interpolation appears to be the most robust Dar´ ıo Garigliotti Task-based Information Retrieval
  28. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR:

    Query Suggestions Task-based Information Retrieval Query Understanding Entities Types Target types Type-aware Entity Retrieval Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  29. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR:

    Query Suggestions Task-based Information Retrieval Query Suggestions Query Understanding Entities Types Target types Type-aware Entity Retrieval Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  30. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions

    Investigated using the setup of the TREC Tasks track Task understanding: Given an initial query, to return a ranked list of query suggestions that cover all the possible subtasks of the task Participation in the track Formalization and analysis of our approach Dar´ ıo Garigliotti Task-based Information Retrieval
  31. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion Dar´ ıo Garigliotti Task-based Information Retrieval
  32. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion P(q|q0) = s d k P(q|q0 , s, k)P(k|s, d) P(d|q0 , s) P(s|q0) . Dar´ ıo Garigliotti Task-based Information Retrieval
  33. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion P(q|q0) = s d k P(q|q0 , s, k)P(k|s, d) P(d|q0 , s) P(s|q0) . Dar´ ıo Garigliotti Task-based Information Retrieval
  34. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion P(q|q0) = s d k P(q|q0 , s, k)P(k|s, d) P(d|q0 , s) P(s|q0) . Dar´ ıo Garigliotti Task-based Information Retrieval
  35. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion P(q|q0) = s d k P(q|q0 , s, k)P(k|s, d) P(d|q0 , s) P(s|q0) . Dar´ ıo Garigliotti Task-based Information Retrieval
  36. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Architecture and Model QS WS WD WH Query suggestions q0 q0 Keyphrases Components: Source importance Document importance Keyphrase relevance Query suggestion P(q|q0) = s d k P(q|q0 , s, k)P(k|s, d) P(d|q0 , s) P(s|q0) . Dar´ ıo Garigliotti Task-based Information Retrieval
  37. Target Types Identification Type-aware Entity Retrieval Query Suggestions Query Suggestions:

    Component Estimations Insights from best component estimations: Query suggestion: Keyphrases as-is  (vs generations) Document importance: Uniform  (vs. rank-based) Source importance: Proportional to best document importance per individual sources (vs uniform, or source group-based) Overall, a high contribution of API query suggestions Dar´ ıo Garigliotti Task-based Information Retrieval
  38. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR

    Task-based Information Retrieval Query Suggestions Query Understanding Entities Types Target types Type-aware Entity Retrieval Target Types Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  39. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR:

    Future work Task-based Information Retrieval Query Suggestions Query Understanding Entities Types Target types Subtasks Type-aware Entity Retrieval Target Types Identification Subtasks Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  40. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR:

    Future work Task-based Information Retrieval Query Suggestions Query Understanding Entities Types Target types Subtasks Type-aware Entity Retrieval Target Types Identification Linking subtasks to target types Subtasks Identification Dar´ ıo Garigliotti Task-based Information Retrieval
  41. Target Types Identification Type-aware Entity Retrieval Query Suggestions Task-based IR:

    Future work Towards a formal task model - Subtasks, i.e., clusters of information needs - Relationship with query types involved - Specific entities involved (?) - ... Dar´ ıo Garigliotti Task-based Information Retrieval