Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Learning to Rank 101, Bringing personalisation to data discovery

4c253af5a9977910b9326b19199d3023?s=47 Pere Urbón
December 06, 2017

Learning to Rank 101, Bringing personalisation to data discovery


Pere Urbón

December 06, 2017


  1. Learning To Rank 101 Pere Urbon Bayes — Data Wrangler

    www.springernature.com www.purbon.com
  2. About me Pere Urbon - Bayes (Berliner since 2011) Software

    Architect and Data Engineer All about systems, data and teams Open Source Advocate and Contributor
  3. All will be available from • github.com/purbon/learning_to_rank_101 • speakerdeck.com/purbon

  4. Building a new search functionality

  5. Building Search A search engine is an information retrieval system

    designed to help find information stored on a computer system. wikipedia.org/wiki/Search_engine_(computing)
  6. Building Search When search works, it can feel almost magical:

    you simply type in what you’re looking for and it’s served up in mere milliseconds. It’s fast, convenient, and super efficient – no wonder so many users prefer search over clicking around the site’s categories! www.baymard.com
  7. Search, how does this works? documents D={d 1 ,d 2

    ,...,d N } IR System Query q List of documents (ranked) d q,1 d q,2 d q,3 d q,4 d q,5 ... d q,n Ranking based relevance TF-IDF, BM25
  8. Building search The phases of building a search engine: •

    Tokenization ◦ synonyms (filter) ◦ stop words (filter) ◦ whitespace ◦ ngram • Analyzer ◦ languages ◦ keywords ◦ standard • Normalization Indexing Time Query Time
  9. Tf-IDF Term frequency - Inverse Document Frequency

  10. Okapi BM25 Okapi search Best Matching 25 (BM25) Others: PageRank,

    Learning to Rank, ….
  11. The second line of defence • Tags and Ontologies. •

    Natural Language Processing. • Result click tracking. • Genetic and evolutionary methods to optimize boosting and weights. • Build your own scorer • ... Scary and Complex!!!
  12. Building great search (can be an art)

  13. Learning to Rank

  14. Learning to Rank The usage of machine learning (supervised, semi-supervised,

    …) to improve the creation of ranking models for information retrieval. Common applications are in search engines, collaborative filtering, machine translation, biological computation, etc. The idea was introduced in 1992 by Norbert Fuhr, describing learning in information retrieval as a parameter estimation problem.
  15. Learning to Rank, how does this works? documents D={d 1

    ,d 2 ,...,d N } IR System Query q m+1 List of documents (ranked) d q,1 , f (qm+1, d1) d q,2, f (qm+1, d1) d q,3, f (qm+1, d1) d q,4, f (qm+1, d1) d q,5, f (qm+1, d1) ... d q,n, f (qm+1, d1) Learning System q 1 d 1,1 d 1,2 d 1,3 ... d q,n q m d m,1 d m,2 d m,3 ... d m,n f (q,d )
  16. Learning to Rank Algorithms can be divided in three different

    groups: • Pointwise: If we assume that each pair (query, document) get a score, then the problem can be approximated by a regression. • Pairwise: In this case the problem is treated as a classification problem, learning how to better classify each given pair of documents. • Listwise: The last case try to optimize the value of one of previous methods, averaged overall queries. Order of quality: Listwise > Pairwise > Pointwise.
  17. Learning to Rank Most popular algorithms are: • RankNet, LamdaRank,

    LamdaMart by Chris C.J Burges et others. www.microsoft.com/en-us/research/publication/ranking-boosting-and- model-adaptation/?from=http%3A%2F%2Fresearch.microsoft.com%2F pubs%2F69536%2Ftr-2008-109.pdf • RankSVM or (*) Gradient descendant variants.
  18. Not only for the big companies.

  19. References Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul

    Lamere.The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. Million Song Dataset, official website by Thierry Bertin-Mahieux, available at: http://labrosa.ee.columbia.edu/millionsong/ Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends in Information Retrieval, Foundations and Trends in Information Retrieval, 3 (3): 225–331, doi:10.1561/1500000016, ISBN 978-1-60198-244-5.
  20. Demo Time….

  21. Thank! Questions? Pere Urbon Bayes — Data Wrangler www.springernature.com www.purbon.com