Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning to Rank 101, Bringing personalisation to data discovery

Pere Urbón
December 06, 2017

Learning to Rank 101, Bringing personalisation to data discovery

Pere Urbón

December 06, 2017
Tweet

More Decks by Pere Urbón

Other Decks in Technology

Transcript

  1. Learning To Rank 101 Pere Urbon Bayes — Data Wrangler

    www.springernature.com www.purbon.com
  2. About me Pere Urbon - Bayes (Berliner since 2011) Software

    Architect and Data Engineer All about systems, data and teams Open Source Advocate and Contributor
  3. Building Search A search engine is an information retrieval system

    designed to help find information stored on a computer system. wikipedia.org/wiki/Search_engine_(computing)
  4. Building Search When search works, it can feel almost magical:

    you simply type in what you’re looking for and it’s served up in mere milliseconds. It’s fast, convenient, and super efficient – no wonder so many users prefer search over clicking around the site’s categories! www.baymard.com
  5. Search, how does this works? documents D={d 1 ,d 2

    ,...,d N } IR System Query q List of documents (ranked) d q,1 d q,2 d q,3 d q,4 d q,5 ... d q,n Ranking based relevance TF-IDF, BM25
  6. Building search The phases of building a search engine: •

    Tokenization ◦ synonyms (filter) ◦ stop words (filter) ◦ whitespace ◦ ngram • Analyzer ◦ languages ◦ keywords ◦ standard • Normalization Indexing Time Query Time
  7. The second line of defence • Tags and Ontologies. •

    Natural Language Processing. • Result click tracking. • Genetic and evolutionary methods to optimize boosting and weights. • Build your own scorer • ... Scary and Complex!!!
  8. Learning to Rank The usage of machine learning (supervised, semi-supervised,

    …) to improve the creation of ranking models for information retrieval. Common applications are in search engines, collaborative filtering, machine translation, biological computation, etc. The idea was introduced in 1992 by Norbert Fuhr, describing learning in information retrieval as a parameter estimation problem.
  9. Learning to Rank, how does this works? documents D={d 1

    ,d 2 ,...,d N } IR System Query q m+1 List of documents (ranked) d q,1 , f (qm+1, d1) d q,2, f (qm+1, d1) d q,3, f (qm+1, d1) d q,4, f (qm+1, d1) d q,5, f (qm+1, d1) ... d q,n, f (qm+1, d1) Learning System q 1 d 1,1 d 1,2 d 1,3 ... d q,n q m d m,1 d m,2 d m,3 ... d m,n f (q,d )
  10. Learning to Rank Algorithms can be divided in three different

    groups: • Pointwise: If we assume that each pair (query, document) get a score, then the problem can be approximated by a regression. • Pairwise: In this case the problem is treated as a classification problem, learning how to better classify each given pair of documents. • Listwise: The last case try to optimize the value of one of previous methods, averaged overall queries. Order of quality: Listwise > Pairwise > Pointwise.
  11. Learning to Rank Most popular algorithms are: • RankNet, LamdaRank,

    LamdaMart by Chris C.J Burges et others. www.microsoft.com/en-us/research/publication/ranking-boosting-and- model-adaptation/?from=http%3A%2F%2Fresearch.microsoft.com%2F pubs%2F69536%2Ftr-2008-109.pdf • RankSVM or (*) Gradient descendant variants.
  12. References Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul

    Lamere.The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. Million Song Dataset, official website by Thierry Bertin-Mahieux, available at: http://labrosa.ee.columbia.edu/millionsong/ Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends in Information Retrieval, Foundations and Trends in Information Retrieval, 3 (3): 225–331, doi:10.1561/1500000016, ISBN 978-1-60198-244-5.