Learning-to-Rank Target Types for Entity-Bearing Queries

Learning-to-Rank Target Types for Entity-Bearing Queries Dar´ ıo Garigliotti University
of Stavanger ICTIR 2017 LEARNER - 1st International Workshop on LEARning Next gEneration Rankers Amsterdam, The Netherlands - October 1st, 2017 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

What are Query Target Types? Dar´ ıo Garigliotti Learning-to-Rank Target
Types for Entity-Bearing Queries

Why to Detect the Target Types of a Query? Type
information have been shown to improve Entity Retrieval More details? Tomorrow at Session 1: On Type-Aware Entity Retrieval by Dar´ ıo Garigliotti and Krisztian Balog Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Learning-to-Rank Target Types We proposed a Learning-to-Rank approach for automatically
identifying the target types of a query Dar´ ıo Garigliotti, Faegheh Hasibi, and Krisztian Balog. 2017. Target Type Identiﬁcation for Entity-Bearing Queries. In Proc. of SIGIR. 845–848. Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Contributions A purpose-built test collection 485 queries from a test
collection for entity retrieval 1 Annotated with DBpedia types via Crowdsourcing A Learning-to-Rank (LTR) method, with a variety of features Random Forest algorithm 1000 trees (iterations) Maximum number of features in each tree equal to 10% of the size of the feature set 1Krisztian Balog and Robert Neumayer. 2013. A Test Collection for Entity Search in DBpedia. In Proc. of SIGIR. 737–740. Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Baselines Entity-centric (EC): rank entities based on their relevance to
the query, then look at the types of the top-k entities Type-centric (TC): build a term-based representation for each type, by aggregating descriptions of assigned entities Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Results Our approach signiﬁcantly outperforms all baselines Method NDCG@1 NDCG@5
EC, LM 0.1417 0.3161 TC, LM 0.2341 0.3780 Learning-to-Rank 0.4842 0.6355 It is robust across different query categories Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Feature Analysis SIM M A X (t;q) SIM A G
G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Feature Analysis Query-type semantic similarities, e.g.: Max. cosine similarity of
word2vec vectors between each pair of query and type terms Cosine similarity between the query and type aggregated word2vec vectors over all terms (centroids) SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Feature Analysis Baseline signals, e.g.: Type-centric score using BM25 Entity-centric
score with top-20 entities using BM25 SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Feature Analysis Knowledge base features, e.g., number of entities the
type covers Type label statistics, e.g., length in chars Query-type lexical similarities, e.g., Jaccard SIM M A X (t;q) SIM A G G R (t;q) SIM A VG (t;q) TCB M 25 (t;q) E N TITIE S(t) E CB M 25;100 (t;q) E CB M 25;50 (t;q) SIB LIN G S(t) E CB M 25;20 (t;q) C H ILD R E N (t) ID FSUM (t) ID FA VG (t) JN O UN S(t;q) E CB M 25;10 (t;q) JTE R M S1 (t;q) E CB M 25;5 (t;q) D E PTH (t) E CLM ;100 (t;q) LE N G TH (t) E CLM ;50 (t;q) TCLM (t;q) E CLM ;20 (t;q) E CLM ;10 (t;q) E CLM ;5 (t;q) JTE R M S2 (t;q) Features 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Gini score 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 NDCG@5 Gini score NDCG@5 Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Next Generation of LTR-based Target Type Identiﬁcation Some insights and
challenges: Generalization to other type systems Training data (acquisition) bottleneck Neural-based features Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Generalization to Other Type Systems Does this method (or these
features) generalize to other type systems? How to ease the acquisition of such type labels? E.g., by knowledge transfer from the test collection of DBpedia target types Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Training Data Acquisition Would more training data be beneﬁcial? How
to collect high-quality labeled data? E.g., automatically by weak supervision Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Neural-based Features We use semantic similarities based in pre-trained word
embeddings Trend of using neural features in a LTR approach Can we go fully neural? Dar´ ıo Garigliotti Learning-to-Rank Target Types for Entity-Bearing Queries

Learning-to-Rank Target Types for Entity-Bearin...

Learning-to-Rank Target Types for Entity-Bearing Queries

Darío Garigliotti

More Decks by Darío Garigliotti

Other Decks in Research

Featured

Transcript

Learning-to-Rank Target Types for Entity-Bearing Queries Dar´ ıo Garigliotti University

What are Query Target Types? Dar´ ıo Garigliotti Learning-to-Rank Target

Why to Detect the Target Types of a Query? Type

Learning-to-Rank Target Types We proposed a Learning-to-Rank approach for automatically

Contributions A purpose-built test collection 485 queries from a test

Baselines Entity-centric (EC): rank entities based on their relevance to

Results Our approach signiﬁcantly outperforms all baselines Method NDCG@1 NDCG@5

Feature Analysis SIM M A X (t;q) SIM A G

Feature Analysis Query-type semantic similarities, e.g.: Max. cosine similarity of

Feature Analysis Baseline signals, e.g.: Type-centric score using BM25 Entity-centric

Feature Analysis Knowledge base features, e.g., number of entities the

Next Generation of LTR-based Target Type Identiﬁcation Some insights and

Generalization to Other Type Systems Does this method (or these

Training Data Acquisition Would more training data be beneﬁcial? How

Neural-based Features We use semantic similarities based in pre-trained word