Learning to Rank 101, Bringing personalisation to data discovery

Learning To Rank 101 Pere Urbon Bayes — Data Wrangler
www.springernature.com www.purbon.com

About me Pere Urbon - Bayes (Berliner since 2011) Software
Architect and Data Engineer All about systems, data and teams Open Source Advocate and Contributor

All will be available from • github.com/purbon/learning_to_rank_101 • speakerdeck.com/purbon

Building a new search functionality

Building Search A search engine is an information retrieval system
designed to help find information stored on a computer system. wikipedia.org/wiki/Search_engine_(computing)

Building Search When search works, it can feel almost magical:
you simply type in what you’re looking for and it’s served up in mere milliseconds. It’s fast, convenient, and super efficient – no wonder so many users prefer search over clicking around the site’s categories! www.baymard.com

Search, how does this works? documents D={d 1 ,d 2
,...,d N } IR System Query q List of documents (ranked) d q,1 d q,2 d q,3 d q,4 d q,5 ... d q,n Ranking based relevance TF-IDF, BM25

Building search The phases of building a search engine: •
Tokenization ◦ synonyms (filter) ◦ stop words (filter) ◦ whitespace ◦ ngram • Analyzer ◦ languages ◦ keywords ◦ standard • Normalization Indexing Time Query Time

Tf-IDF Term frequency - Inverse Document Frequency

Okapi BM25 Okapi search Best Matching 25 (BM25) Others: PageRank,
Learning to Rank, ….

The second line of defence • Tags and Ontologies. •
Natural Language Processing. • Result click tracking. • Genetic and evolutionary methods to optimize boosting and weights. • Build your own scorer • ... Scary and Complex!!!

Building great search (can be an art)

Learning to Rank

Learning to Rank The usage of machine learning (supervised, semi-supervised,
…) to improve the creation of ranking models for information retrieval. Common applications are in search engines, collaborative filtering, machine translation, biological computation, etc. The idea was introduced in 1992 by Norbert Fuhr, describing learning in information retrieval as a parameter estimation problem.

Learning to Rank, how does this works? documents D={d 1
,d 2 ,...,d N } IR System Query q m+1 List of documents (ranked) d q,1 , f (qm+1, d1) d q,2, f (qm+1, d1) d q,3, f (qm+1, d1) d q,4, f (qm+1, d1) d q,5, f (qm+1, d1) ... d q,n, f (qm+1, d1) Learning System q 1 d 1,1 d 1,2 d 1,3 ... d q,n q m d m,1 d m,2 d m,3 ... d m,n f (q,d )

Learning to Rank Algorithms can be divided in three different
groups: • Pointwise: If we assume that each pair (query, document) get a score, then the problem can be approximated by a regression. • Pairwise: In this case the problem is treated as a classification problem, learning how to better classify each given pair of documents. • Listwise: The last case try to optimize the value of one of previous methods, averaged overall queries. Order of quality: Listwise > Pairwise > Pointwise.

Learning to Rank Most popular algorithms are: • RankNet, LamdaRank,
LamdaMart by Chris C.J Burges et others. www.microsoft.com/en-us/research/publication/ranking-boosting-and- model-adaptation/?from=http%3A%2F%2Fresearch.microsoft.com%2F pubs%2F69536%2Ftr-2008-109.pdf • RankSVM or (*) Gradient descendant variants.

Not only for the big companies.

References Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul
Lamere.The Million Song Dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011), 2011. Million Song Dataset, official website by Thierry Bertin-Mahieux, available at: http://labrosa.ee.columbia.edu/millionsong/ Tie-Yan Liu (2009), "Learning to Rank for Information Retrieval", Foundations and Trends in Information Retrieval, Foundations and Trends in Information Retrieval, 3 (3): 225–331, doi:10.1561/1500000016, ISBN 978-1-60198-244-5.

Demo Time….

Thank! Questions? Pere Urbon Bayes — Data Wrangler www.springernature.com www.purbon.com

Learning to Rank 101, Bringing personalisation ...

Learning to Rank 101, Bringing personalisation to data discovery

Pere Urbón

More Decks by Pere Urbón

Other Decks in Technology

Featured

Transcript

Learning To Rank 101 Pere Urbon Bayes — Data Wrangler

About me Pere Urbon - Bayes (Berliner since 2011) Software

All will be available from • github.com/purbon/learning_to_rank_101 • speakerdeck.com/purbon

Building a new search functionality

Building Search A search engine is an information retrieval system

Building Search When search works, it can feel almost magical:

Search, how does this works? documents D={d 1 ,d 2

Building search The phases of building a search engine: •

Tf-IDF Term frequency - Inverse Document Frequency

Okapi BM25 Okapi search Best Matching 25 (BM25) Others: PageRank,

The second line of defence • Tags and Ontologies. •

Building great search (can be an art)

Learning to Rank

Learning to Rank The usage of machine learning (supervised, semi-supervised,

Learning to Rank, how does this works? documents D={d 1

Learning to Rank Algorithms can be divided in three different

Learning to Rank Most popular algorithms are: • RankNet, LamdaRank,

Not only for the big companies.

References Thierry Bertin-Mahieux, Daniel P.W. Ellis, Brian Whitman, and Paul

Demo Time….

Thank! Questions? Pere Urbon Bayes — Data Wrangler www.springernature.com www.purbon.com