Upgrade to Pro — share decks privately, control downloads, hide ads and more …

About "Towards Better Text Understanding and Re...

About "Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling"

Summary of the paper "Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling" presented at SIGIR 2018
Date: October 17, 2018
Venue: London, UK. Reading group

Please cite, link to or credit this presentation when using it or part of it in your work.

#InformationRetrieval #IR #EntityOrientedSearch

Darío Garigliotti

October 17, 2018
Tweet

More Decks by Darío Garigliotti

Other Decks in Research

Transcript

  1. Reading Group 17.10.2018 Towards Better Text Understanding and Retrieval through

    Kernel Entity Salience Modeling Xiong, Liu, Callan, and Liu SIGIR 2018
  2. Motivation • Interest in knowing how salient (important and central)

    a term (word, entity) is in a document • Word frequency largely exploited for document retrieval • But frequency is not necessarily same as salience • Entity salience is still a young task • Effectiveness of salience for ad hoc search is not explored yet
  3. Main Messages • They represent an entity combining the textual

    information and semantics from knowledge base • They obtain a better modeling of salience, beyond frequency, by entity interactions • They can improve web document search with salience, generalizing from news corpus
  4. Experimental Setup • Two datasets • New York Times (~

    500k articles with summaries) • e in article is salient if it's also in summary • Semantic Scholar (~ 1m abstracts) • e in abstract is salient if it's also in title • Ranking-focused metrics: P@1, P@5, R@1, R@5 • Well documented parameter settings
  5. Analysis VS. Frequency • KESM is able to model salience

    of tail entities • KESM is more reliable on short documents
  6. Salience for Ad hoc Search • Entity salience should model

    a better text understanding • So it should hep for document search • Ranking uses the salience of the query entities in a candidate document • End-to-end training with sufficient relevance labels, or using model pre-trained on salience