Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning-to-Rank Target Types for Entity-Bearin...

Learning-to-Rank Target Types for Entity-Bearing Queries

Date: October 1st, 2017
Venue: Amsterdam, The Netherlands. LEARNER 2017, co-located with the 2017 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR '17)
Corresponding article: http://ceur-ws.org/Vol-2007/LEARNER2017_short_3.pdf

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #EntityRanking #EntityRetrieval #ER #EntityTypes #EntityOrientedSearch #KnowledgeBases #SemanticSearch #LearningToRank #LTR

Darío Garigliotti

October 01, 2017
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. Learning-to-Rank Target Types for Entity-Bearing Queries Dar´ ıo Garigliotti University

    of Stavanger ICTIR 2017 LEARNER - 1st International Workshop on LEARning Next gEneration Rankers Amsterdam, The Netherlands - October 1st, 2017 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  2. Why to Detect the Target Types of a Query? Type

    information have been shown to improve Entity Retrieval More details? Tomorrow at Session 1: On Type-Aware Entity Retrieval by Dar´ ıo Garigliotti and Krisztian Balog Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  3. Learning-to-Rank Target Types We proposed a Learning-to-Rank approach for automatically

    identifying the target types of a query Dar´ ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identification for Entity-Bearing Queries. In Proc. of SIGIR. 845–848. Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  4. Contributions A purpose-built test collection 485 queries from a test

    collection for entity retrieval 1 Annotated with DBpedia types via Crowdsourcing A Learning-to-Rank (LTR) method, with a variety of features Random Forest algorithm 1000 trees (iterations) Maximum number of features in each tree equal to 10% of the size of the feature set 1Krisztian Balog and Robert Neumayer. 2013. A Test Collection for Entity Search in DBpedia. In Proc. of SIGIR. 737–740. Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  5. Baselines Entity-centric (EC): rank entities based on their relevance to

    the query, then look at the types of the top-k entities Type-centric (TC): build a term-based representation for each type, by aggregating descriptions of assigned entities Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  6. Results Our approach significantly outperforms all baselines Method NDCG@1 NDCG@5

    EC, LM 0.1417 0.3161 TC, LM 0.2341 0.3780 Learning-to-Rank 0.4842 0.6355 It is robust across different query categories Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  7. Feature Analysis SIM M A X (t;q) SIM A G

    G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  8. Feature Analysis Query-type semantic similarities, e.g.: Max. cosine similarity of

    word2vec vectors between each pair of query and type terms Cosine similarity between the query and type aggregated word2vec vectors over all terms (centroids) SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  9. Feature Analysis Baseline signals, e.g.: Type-centric score using BM25 Entity-centric

    score with top-20 entities using BM25 SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  10. Feature Analysis Knowledge base features, e.g., number of entities the

    type covers Type label statistics, e.g., length in chars Query-type lexical similarities, e.g., Jaccard SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  11. Next Generation of LTR-based Target Type Identification Some insights and

    challenges: Generalization to other type systems Training data (acquisition) bottleneck Neural-based features Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  12. Generalization to Other Type Systems Does this method (or these

    features) generalize to other type systems? How to ease the acquisition of such type labels? E.g., by knowledge transfer from the test collection of DBpedia target types Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  13. Training Data Acquisition Would more training data be beneficial? How

    to collect high-quality labeled data? E.g., automatically by weak supervision Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries
  14. Neural-based Features We use semantic similarities based in pre-trained word

    embeddings Trend of using neural features in a LTR approach Can we go fully neural? Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries