Slide 1

Slide 1 text

1 Confidential - Do Not Share Reward Modeling for Layout Optimizer Ryo Watanabe Marketplace / Recommendation Team

Slide 2

Slide 2 text

2 Confidential - Do Not Share Internship from 8/1 to 9/30
 Recommendation team
 ML engineer
 Mentor: @shido san
 Manager: @umechan san
 1st year of Master’s degree
 Ryo Watanabe @nabe-ryo


Slide 3

Slide 3 text

3 Confidential - Do Not Share Tackled reward modeling for Layout Optimizer ● Layout Optimizer ○ Optimize the layout of home screen on Mercari app. ○ “Reward modeling” is needed to predict the performance of each layout in order to select the best one. New predictive model using MLP (Multi-Layer Perceptron, a simple deep neural network model) trained with new dataset achieved higher performance than the previous one. Executive Summary

Slide 4

Slide 4 text

4 Confidential - Do Not Share Home screen of Mercari app can serve various layouts that consist of multiple components. We want to serve the optimal layout that earns the highest engagements. → Layout Optimizer Background: Home Screen ← Retargeting by Like component ← Recommended Keywords component

Slide 5

Slide 5 text

5 Confidential - Do Not Share Background: Layout Optimizer Bandit problem: choose the best one which maximize a reward in several candidates during exploring alternatives that we don’t know the reward distribution. However, the reward based on purchase of any items comes delayed from an action of selecting a layout. So, we cannot guess which layout is the best immediately. → It is necessary to predict the reward in a short time. Select the optimal layout of home screen in mercari app using a bandit algorithm. Agent Select Reward Candidates

Slide 6

Slide 6 text

6 Confidential - Do Not Share Reward Modeling Input: Action logs of users Output: Conversion rate (probabilities from 0 to 1) ● Problem settings ○ Predict the probability that any item viewed via the home screen is purchased with action logs in one hour. Predict the reward as it comes late from selection of layout by LO.

Slide 7

Slide 7 text

7 Confidential - Do Not Share Model and data of conventional method データ&モデル ● Model ○ Logistic regression ■ Linear model, binary classification. ● Data ○ Features ■ 35 kinds of events from client side logs. ■ Binary features which represent each event has occurred or not. ○ Labels ■ Binary label which means any item viewed via home is purchased.

Slide 8

Slide 8 text

8 Confidential - Do Not Share Model Improvement Models used in this experiment. ● MLP ○ Multi Layer Perceptron ○ A fundamental neural network architecture which consists of linear layers and activation functions. ● XGBoost ○ Decision tree based algorithm which ensembles decision trees using boosting algorithm.

Slide 9

Slide 9 text

9 Confidential - Do Not Share Data Improvement Counted the number of times each event occurred. Reduce some events which less frequently occurred. Feature modification

Slide 10

Slide 10 text

10 Confidential - Do Not Share Experiments 実験設定 Trained the models under the below conditions and evaluated their performance. ● Dataset ○ Baseline ○ New Data ○ Period: 2022/7/18 - 2022/7/25 ● Criteria ○ AUC of ROC curve ○ AUC of PR curve

Slide 11

Slide 11 text

11 Confidential - Do Not Share Improved models trained with new dataset got the higher results than the baseline. MLP marked the highest results both ROC AUC and PR AUC. Results ROC AUC ↑ PR AUC ↑ Baseline (Logistic Reg) 0.59 0.05 MLP + New Dataset 0.86 0.22 XGBoost + New Dataset 0.85 0.21

Slide 12

Slide 12 text

12 Confidential - Do Not Share Conclusion Contributions and future works. I worked on a task to improve a model and data used in layout optimizer which optimizes the layout of home screen in mercari app. New models and data got the higher performance than the baseline. A possible future work is to see if LO using the new reward modeling can improve the criteria such as BCR and GMV by A/B testing.

Slide 13

Slide 13 text

13 Confidential - Do Not Share Thank you for your listening

Slide 14

Slide 14 text

14 Confidential - Do Not Share Appendix

Slide 15

Slide 15 text

15 Confidential - Do Not Share MLP Details of Models parameters Num of layers 2 Hidden units 100, 10 Optimizer Adam Epochs 20 Batch size 200 Learning rate 0.001 XGBoost parameters Max depth 3 Num of trees 100 Learning rate 0.3