Mercan_internship_finalpresentation_nabe-ryo

 Mercan_internship_finalpresentation_nabe-ryo

More Decks by mercari

Other Decks in Technology

Transcript

  1. 1 Confidential - Do Not Share Reward Modeling for Layout

    Optimizer Ryo Watanabe Marketplace / Recommendation Team
  2. 2 Confidential - Do Not Share Internship from 8/1 to

    9/30
 Recommendation team
 ML engineer
 Mentor: @shido san
 Manager: @umechan san
 1st year of Master’s degree
 Ryo Watanabe @nabe-ryo

  3. 3 Confidential - Do Not Share Tackled reward modeling for

    Layout Optimizer • Layout Optimizer ◦ Optimize the layout of home screen on Mercari app. ◦ “Reward modeling” is needed to predict the performance of each layout in order to select the best one. New predictive model using MLP (Multi-Layer Perceptron, a simple deep neural network model) trained with new dataset achieved higher performance than the previous one. Executive Summary
  4. 4 Confidential - Do Not Share Home screen of Mercari

    app can serve various layouts that consist of multiple components. We want to serve the optimal layout that earns the highest engagements. → Layout Optimizer Background: Home Screen ← Retargeting by Like component ← Recommended Keywords component
  5. 5 Confidential - Do Not Share Background: Layout Optimizer Bandit

    problem: choose the best one which maximize a reward in several candidates during exploring alternatives that we don’t know the reward distribution. However, the reward based on purchase of any items comes delayed from an action of selecting a layout. So, we cannot guess which layout is the best immediately. → It is necessary to predict the reward in a short time. Select the optimal layout of home screen in mercari app using a bandit algorithm. Agent Select Reward Candidates
  6. 6 Confidential - Do Not Share Reward Modeling Input: Action

    logs of users Output: Conversion rate (probabilities from 0 to 1) • Problem settings ◦ Predict the probability that any item viewed via the home screen is purchased with action logs in one hour. Predict the reward as it comes late from selection of layout by LO.
  7. 7 Confidential - Do Not Share Model and data of

    conventional method データ&モデル • Model ◦ Logistic regression ▪ Linear model, binary classification. • Data ◦ Features ▪ 35 kinds of events from client side logs. ▪ Binary features which represent each event has occurred or not. ◦ Labels ▪ Binary label which means any item viewed via home is purchased.
  8. 8 Confidential - Do Not Share Model Improvement Models used

    in this experiment. • MLP ◦ Multi Layer Perceptron ◦ A fundamental neural network architecture which consists of linear layers and activation functions. • XGBoost ◦ Decision tree based algorithm which ensembles decision trees using boosting algorithm.
  9. 9 Confidential - Do Not Share Data Improvement Counted the

    number of times each event occurred. Reduce some events which less frequently occurred. Feature modification
  10. 10 Confidential - Do Not Share Experiments 実験設定 Trained the

    models under the below conditions and evaluated their performance. • Dataset ◦ Baseline ◦ New Data ◦ Period: 2022/7/18 - 2022/7/25 • Criteria ◦ AUC of ROC curve ◦ AUC of PR curve
  11. 11 Confidential - Do Not Share Improved models trained with

    new dataset got the higher results than the baseline. MLP marked the highest results both ROC AUC and PR AUC. Results ROC AUC ↑ PR AUC ↑ Baseline (Logistic Reg) 0.59 0.05 MLP + New Dataset 0.86 0.22 XGBoost + New Dataset 0.85 0.21
  12. 12 Confidential - Do Not Share Conclusion Contributions and future

    works. I worked on a task to improve a model and data used in layout optimizer which optimizes the layout of home screen in mercari app. New models and data got the higher performance than the baseline. A possible future work is to see if LO using the new reward modeling can improve the criteria such as BCR and GMV by A/B testing.
  13. 15 Confidential - Do Not Share MLP Details of Models

    parameters Num of layers 2 Hidden units 100, 10 Optimizer Adam Epochs 20 Batch size 200 Learning rate 0.001 XGBoost parameters Max depth 3 Num of trees 100 Learning rate 0.3