Slide 1

Slide 1 text

1 Auto Content Moderation in C2C e-Commerce Shunya Ueta, Suganprabu Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference on Operational Machine Learning JULY 27–AUGUST 7, 2020

Slide 2

Slide 2 text

2 1. Content Moderation 2. Auto Content Moderation in C2C e-Commerce 3. Task design and model strategy 4. Offline/online evaluation 5. System architecture 6. Business Impact Contents

Slide 3

Slide 3 text

3 Identify potentially unsafe or inappropriate content in service ● App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale ● YouTube Community Guidelines enforcement ● AI advances to better detect hate speech by Facebook ● Advances in content understanding, self-supervision to protect people by Facebook ● Facebook Transparency Report ● A Safe and Secure Marketplace by Mercari ● etc. Content Moderation

Slide 4

Slide 4 text

4 The Mercari app is a C2C marketplace where individuals can easily sell used items What is Mercari? Japan U.S. Monthly active users: 16+ Million Total number of items: 1.5+ Billion

Slide 5

Slide 5 text

5 Why Content Moderation in C2C e-Commerce? C2C e-Commerce Sellers Buyers We want to decrease risk for customer and marketplace Sellers unintentionally violate policy. Buyers buy violated items without knowing Policy case: counterfeits, weapons, etc.

Slide 6

Slide 6 text

6 Content Moderation system C2C e-Commerce Sell items Discover Moderator Manual review Moderation Service Hide items & Alert marketplace Sellers Buyers screened

Slide 7

Slide 7 text

7 Concept of Moderation Service: Rule based Moderation Service Rule based Pros ● Easy to develop and can be quickly released to production Cons ● Hard to manage ● Difficult to cover the inconsistencies in spellings e.g. {NIKE, nike, ないき, ナイキ} Moderator Manual review

Slide 8

Slide 8 text

8 Concept of Moderation Service: ML Moderation Service Rule based Pros ● Automatically learns the features of items deleted by moderators ● Adapts to spelling inconsistencies Cons ● Model update is hard ● Concept drift (a.k.a. training-serving skew) Moderator Manual review Machine Learning

Slide 9

Slide 9 text

9 How to create the data for ML Rule based Moderator Machine Learning Sell items Report items Hide items & Alert Positive Deleted items by Moderator Negative Not deleted items by Moderator Dataset Moderation Service Review

Slide 10

Slide 10 text

10 Task Design ● Data is highly imbalanced ● Each violated topic’s total number of alerts is bounded by moderator team All models trained as one-vs-all ● No side-effect when deploying a trained model to other class ● Hard to improve performance for each topic in a multi-class model Negative Violated Topic A Violated Topic N ... Positive Model A Model B ... counterfeits weapons

Slide 11

Slide 11 text

11 Multimodality of content Case of items Items have multimodal data ● Image ● Text ● Category ● Brand ● Price, etc. We use multimodal model to improve model performance. See our article: https://tech.mercari.com/entry/2019/09/12/130000

Slide 12

Slide 12 text

12 Model selection based on dataset size ● Gradient Boosted Decision Trees (GBDT) → Efficient for training and inference when training data size is not large *Image feature is not used in GBDT ● Gated Multimodal Unit (GMU) → Potentially most accurate using multimodal data

Slide 13

Slide 13 text

13 Offline evaluation Metric is Precision@K: K is the bound on the daily total number of alerts in each violated topic decided by Moderators 2020-07-13 Current model’s prediction result In production Top K Evaluate new model against current model using the same item ids item ids same as production top K 2020-07-13 New model’s prediction result In test dataset. e.g.

Slide 14

Slide 14 text

14 Online evaluation → Faster decision making leads to efficient operation Current Model New Model Same traffic Moderator Manual review Each model alert number: K/2 Metrics: Precision@K/2 After a certain time after a new model is released, we decide which model should be deprecated based on the above metrics. Classic A/B testing can take several months. It was difficult to collect enough transactions for t-test.

Slide 15

Slide 15 text

15 Offline/online evaluation result Algorithms Offline Online GBDT +18.2% Not Released GMU +21.2% +23.2% Table shows the relative performance gain of offline evaluation metric is precision@K , online evaluation metric is precision@K/2 on one violated topic Baseline model is Logistic regression that was already released in production

Slide 16

Slide 16 text

16 Container based Training Pipeline Data Load Write manifest files containing requirements like CPU, GPU and Storage CPU CPU or GPU Training Offline Evaluation CPU BigQuery BigQuery

Slide 17

Slide 17 text

17 Serving system architecture Message queue Message queue proxy layer prediction layer . Preprocessing + inference Container Pod GBDT based model Preprocessing Container . . Proxy container subscribe publish Pod Inference Container Caffe2 Pod Deep Learning based model We manage over 15 Machine Learning models in production Pod Deep Learning based model

Slide 18

Slide 18 text

18 Horizontal Pod Autoscaler by kubernetes ● Reliable system: Traffic changes with time, HPA can adopt to varying traffic ● Cheaper billing cost: Reduce to 1/6 by HPA Billing cost transition after applying HPA Billing cost day Each color is each machine learning model

Slide 19

Slide 19 text

19 Impact of Machine Learning system Discovered 100 violating items Moderator Manual review Moderation Service Rule based Machine Learning Hide & Alert +Discovered 554 violating items Machine Learning system has increased coverage by 554% ↑ over rule based approach e.g.

Slide 20

Slide 20 text

20 If you have a question to this talk First author is Shunya UETA, please e-mail: [email protected] Acknowledgements Co-Authors: Suganprabu Nagarajan, Mizuki Sango Contributter: ● Abhishek Vilas Munagekar, Yusuke Shido, Vamshi Teja Racha, Sumit Verma and Keisuke Umezawa for their contribute to this system ● Dr. Antony for his feedback about the paper ● Yushi Kurita, Yuki Ito as Product Manager, All Trust and Safety project member and all Customer Service as Moderator to success this project. Question and Thanks collaborator