Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Predicting Online Performance of Job Recommender Systems

Masa Kazama
October 05, 2019

Predicting Online Performance of Job Recommender Systems

Predicting Online Performance of Job Recommender Systems
RecSys2019 short paper

Masa Kazama

October 05, 2019
Tweet

More Decks by Masa Kazama

Other Decks in Research

Transcript

  1. Predicting Online Performance of Job Recommender Systems RecSys2019 Short paper

    Indeed, Tokyo Japan Adrien M, Tuan A, @masa_kazama, Jialin K
  2. Problem and Motivation • Online evaluation (A/B testing) is usually

    the most reliable way to measure the results from our experiments, but it is a slow process. • Offline evaluation process is faster, but it is critical to make it reliable as it informs our decision to roll out new improvements in production.
  3. Problem and Motivation • What are the offline evaluation metrics

    we should monitor to expect an impact in production? • What is the level of confidence we can have in the offline results? • How should we decide to push or not push a new model to production?
  4. Funnel in job recommendation Typical conversion funnel in job recommendation

    Impression → Click → Apply → Interview → Get a Job Focus on first half of the funnel because later funnel is very sparse. apply-rate@10 = # applies up to rank 10 # impressions up to rank 10
  5. Recommendation Models • Word2vec (w2v): an embedding model using negative

    sampling, where the model captures the sequence of actions • Word2vec (w2vhs): a variant of word2vec using hierarchical softmax • Knn: as an item-based collaborative filtering technique
  6. Word2vec for user action data (Item embedding) • Implicit data

    ◦ Click data ◦ Bookmark data ◦ Apply data UserID, ItemID, TimeStamp User1, Item2, 2016/02/12 User1, Item6, 2016/02/17 User1, Item7, 2016/02/19 User2, Item2, 2016/02/12 User2, Item9, 2016/02/17 User2, Item10, 2016/02/19 User2, Item12, 2016/02/20 Ex
  7. Ex. Apply data UserID, ItemID, TimeStamp User1, Item2, 2016/02/12 User1,

    Item6, 2016/02/17 User1, Item7, 2016/02/19 User2, Item2, 2016/02/12 User2, Item9, 2016/02/17 User2, Item10, 2016/02/19 User2, Item12, 2016/02/20 [Item2, Item6, Item7] [Item2, Item9, Item10, Item12] . . We consider a ItemID as a word and Items the user clicked as a document. We can apply word2vec.
  8. Metrics Evaluation Metrics • MAP • MPR • Precision@k (p@k)

    • NDCG@k • Recall@k (r@k) k in (3, 10, 20, 30, 40) Process • During two weeks, we run an A/B test with one bucket for each model. • Daily, we generate new recommendations based on the past data and compare the performance in production apply-rate with the offline performance (p@k, etc.)
  9. Results Model apply -rate apply- rate@10 p@10 MAP MPR NDCG

    @10 r@10 r@100 w2v - - - - - - - - knn +17% +11% +9.3% -54% +11% -47% -38% +3.2% w2vhs +48% +46% +90% +51% -5.1% +60% +70% +65% Cross-model comparison, averaged over days; word2vec is used as baseline. The metrics in bold do not have the expected sign, e.g. online performance increased, but offline evaluation metric decreased.
  10. Conclusion • We conclude those offline evaluation metrics are reliable

    enough to decide to not deploy the new models when the offline performances are significantly negative; and to deploy the new models when there is a positive impact on the offline metrics. • We recommend p@k, which showed a consistent predictive power, when the recommendation task is focused on precision.
  11. OSS contribution • Add Recall@k metric to RankingMetrics in Spark

    ◦ https://github.com/apache/spark/pull/23881 • Add nmslib indexer to Gensim ◦ https://github.com/RaRe-Technologies/gensim/pull/2417 • Write a tutorial for nmslib indexer in Gensim ◦ https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/nmslibtut orial.ipynb