Shotaro Ishihara
July 13, 2021
310

# Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction

Shotaro Ishihara, Shuhei Goda, and Hidehisa Arai. 2021. Adversarial Validation to Select Validation Data for Evaluating Performance in E-commerce Purchase Intent Prediction. In Proceedings of ACM SIGIR Workshop on eCommerce (SIGIR eCom’21). ACM, New York, NY, USA, 5 pages.
https://github.com/upura/sigir-ecom-2021/

July 13, 2021

## Transcript

1. Adversarial Validation to Select Validation Data
for Evaluating Performance
in E-commerce Purchase Intent Prediction
Shotaro Ishihara (Nikkei, Inc.), Shuhei Goda (Wantedly, Inc.), Hidehisa Arai (Recruit Co., Ltd.)
July 15th 2021, SIGIR eCom’21
Third place solution at Purchase Intent Prediction task in Coveo Data Challenge

2. Competition: Find as many positive samples as possible
<- our team
2
<- (strong) baseline: all zero

3. 2021-07-15 22:05:00, search
2021-07-15 22:05:30, view detail
2021-07-15 22:06:20, view detail
2021-07-15 22:07:00, search
2021-07-15 22:07:30, view detail
2021-07-15 ??:??:??, purchase or not
Overview of Purchase Intent Prediction Tasks
3
......
2021-07-15 22:08:00, view detail
2021-07-15 22:09:30, search
The number of browsing events (nb) after “add to cart”
nb ∈ {0, 2, 4, 6, 8, 10} in test data

4. Solution Overview
4
2021-07-15 22:05:00, search
2021-07-15 22:05:30, view detail
2021-07-15 22:06:20, view detail
2021-07-15 22:07:00, search
2021-07-15 22:07:30, view detail
Feature engineering
-> LightGBM (nb ∈ {0, 2, 4, 6, 8, 10})
-> nb ∈ {0, 2, 4, 6, 8}: predict all samples as negative
nb ∈ {10}: predict a few samples with high conﬁdent as positive by rank averaging of two models
Transformer & LSTM (nb ∈ {0, 2, 4, 6, 8, 10})

5. Diﬃculties:
- Train & test data were split by timeline.
- Participants had to extract train data from the original data.
- There was an extreme class imbalance.
- Only total ten submissions were allowed for the ﬁnal stage.
Validation methodology:
- Simple cross validation would not to be appropriate due to concept drift and
class imbalance.
Key Points
5

6. Cross Validation & Adversarial Validation
6
Cross validation: The data is divided into k folds; k-1 folds are used for training
and the other fold is used for validation, which is done for all combinations.
Adversarial validation: A binary classiﬁer is trained to predict whether a sample
belongs to test data or not. Training data highly similar to test data is sampled.
Train
Test
Validation
Hold out
fold
Cross validation fold
fold
fold

7. Our Validation Strategy
7
Test
Validation
Cross validation Train
Train
Train
Train
Validation
Train
Train
Train
Train
Validation
Train
Train
Train
Train
Validation
Validation
Validation
Validation
Validation
Validation
Select validation data

8. Validation Results
- Adversarial validation results told us the bigger nb model performed better.
- Using nb==10 model led us to outperform the baseline. The other models
didn’t work for us.
- When we use all validation data (cross validation) and random selection
(extract the same number of train data as the test data), we couldn’t get any
insight which can be used for the submission.
8

9. - This paper described a methodology of using adversarial validation to select
validation data for the evaluation of machine learning models.
- We tackled the e-commerce purchase intent prediction task and the insight
gained by the proposed methodology enabled us to outperform the baseline.
- Source codes are available at https://github.com/upura/sigir-ecom-2021/.
- ACM Reference Format: Shotaro Ishihara, Shuhei Goda, and Hidehisa Arai. 2021.
Adversarial Validation to Select Validation Data for Evaluating Performance in
E-commerce Purchase Intent Prediction. In Proceedings of ACM SIGIR Workshop
on eCommerce (SIGIR eCom’21). ACM, New York, NY, USA, 5 pages.
Conclusion
9