Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction

Slide 1

Slide 1 text

Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction Shotaro Ishihara (Nikkei, Inc.), Shuhei Goda (Wantedly, Inc.), Hidehisa Arai (Recruit Co., Ltd.) July 15th 2021, SIGIR eCom’21 Third place solution at Purchase Intent Prediction task in Coveo Data Challenge

Slide 2

Slide 2 text

Competition: Find as many positive samples as possible <- our team 2 <- (strong) baseline: all zero https://sigir-ecom.github.io/data-task.html

Slide 3

Slide 3 text

2021-07-15 22:05:00, search 2021-07-15 22:05:30, view detail 2021-07-15 22:06:20, view detail 2021-07-15 22:07:00, search 2021-07-15 22:07:30, view detail 2021-07-15 22:07:50, add to cart 2021-07-15 ??:??:??, purchase or not Overview of Purchase Intent Prediction Tasks 3 ...... 2021-07-15 22:08:00, view detail 2021-07-15 22:09:30, search The number of browsing events (nb) after “add to cart” nb ∈ {0, 2, 4, 6, 8, 10} in test data

Slide 4

Slide 4 text

Solution Overview 4 2021-07-15 22:05:00, search 2021-07-15 22:05:30, view detail 2021-07-15 22:06:20, view detail 2021-07-15 22:07:00, search 2021-07-15 22:07:30, view detail 2021-07-15 22:07:50, add to cart Feature engineering -> LightGBM (nb ∈ {0, 2, 4, 6, 8, 10}) -> nb ∈ {0, 2, 4, 6, 8}: predict all samples as negative nb ∈ {10}: predict a few samples with high conﬁdent as positive by rank averaging of two models Transformer & LSTM (nb ∈ {0, 2, 4, 6, 8, 10})

Slide 5

Slide 5 text

Diﬃculties: - Train & test data were split by timeline. - Participants had to extract train data from the original data. - There was an extreme class imbalance. - Only total ten submissions were allowed for the ﬁnal stage. Validation methodology: - Simple cross validation would not to be appropriate due to concept drift and class imbalance. Key Points 5

Slide 6

Slide 6 text

Cross Validation & Adversarial Validation 6 Cross validation: The data is divided into k folds; k-1 folds are used for training and the other fold is used for validation, which is done for all combinations. Adversarial validation: A binary classiﬁer is trained to predict whether a sample belongs to test data or not. Training data highly similar to test data is sampled. Train Test Validation Hold out fold Cross validation fold fold fold

Slide 7

Slide 7 text

Our Validation Strategy 7 Test Validation Cross validation Train Train Train Train Validation Train Train Train Train Validation Train Train Train Train Validation Validation Validation Validation Validation Validation Adversarial validation Select validation data

Slide 8

Slide 8 text

Validation Results - Adversarial validation results told us the bigger nb model performed better. - Using nb==10 model led us to outperform the baseline. The other models didn’t work for us. - When we use all validation data (cross validation) and random selection (extract the same number of train data as the test data), we couldn’t get any insight which can be used for the submission. 8

Slide 9

Slide 9 text

- This paper described a methodology of using adversarial validation to select validation data for the evaluation of machine learning models. - We tackled the e-commerce purchase intent prediction task and the insight gained by the proposed methodology enabled us to outperform the baseline. - Source codes are available at https://github.com/upura/sigir-ecom-2021/. - ACM Reference Format: Shotaro Ishihara, Shuhei Goda, and Hidehisa Arai. 2021. Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction. In Proceedings of ACM SIGIR Workshop on eCommerce (SIGIR eCom’21). ACM, New York, NY, USA, 5 pages. Conclusion 9