Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Auto Content Moderation in C2C e-Commerce at OpML20

Auto Content Moderation in C2C e-Commerce at OpML20

2020 USENIX Conference on Operational Machine Learning

Shunya Ueta, Suganprabu Nagaraja, and Mizuki Sango, Mercari, inc.

Consumer-to-consumer (C2C) e-Commerce is a large and growing industry with millions of monthly active users. In this paper, we propose auto content moderation for C2C e-Commerce to moderate items using Machine Learning (ML). We will also discuss practical knowledge gained from our auto content moderation system. The system has been deployed to production at Mercari since late 2017 and has significantly reduced the operation cost in detecting items violating our policies. This system has increased coverage by 554.8 % over a rule-based approach.


Shunya Ueta

July 27, 2020

More Decks by Shunya Ueta

Other Decks in Programming


  1. 1 Auto Content Moderation in C2C e-Commerce Shunya Ueta, Suganprabu

    Nagarajan, Mizuki Sango (Mercari, inc) 2020 USENIX Conference on Operational Machine Learning JULY 27–AUGUST 7, 2020
  2. 2 1. Content Moderation 2. Auto Content Moderation in C2C

    e-Commerce 3. Task design and model strategy 4. Offline/online evaluation 5. System architecture 6. Business Impact Contents
  3. 3 Identify potentially unsafe or inappropriate content in service •

    App Discovery with Google Play, Part 3: Machine Learning to Fight Spam and Abuse at Scale • YouTube Community Guidelines enforcement • AI advances to better detect hate speech by Facebook • Advances in content understanding, self-supervision to protect people by Facebook • Facebook Transparency Report • A Safe and Secure Marketplace by Mercari • etc. Content Moderation
  4. 4 The Mercari app is a C2C marketplace where individuals

    can easily sell used items What is Mercari? Japan U.S. Monthly active users: 16+ Million Total number of items: 1.5+ Billion
  5. 5 Why Content Moderation in C2C e-Commerce? C2C e-Commerce Sellers

    Buyers We want to decrease risk for customer and marketplace Sellers unintentionally violate policy. Buyers buy violated items without knowing Policy case: counterfeits, weapons, etc.
  6. 6 Content Moderation system C2C e-Commerce Sell items Discover Moderator

    Manual review Moderation Service Hide items & Alert marketplace Sellers Buyers screened
  7. 7 Concept of Moderation Service: Rule based Moderation Service Rule

    based Pros • Easy to develop and can be quickly released to production Cons • Hard to manage • Difficult to cover the inconsistencies in spellings e.g. {NIKE, nike, ないき, ナイキ} Moderator Manual review
  8. 8 Concept of Moderation Service: ML Moderation Service Rule based

    Pros • Automatically learns the features of items deleted by moderators • Adapts to spelling inconsistencies Cons • Model update is hard • Concept drift (a.k.a. training-serving skew) Moderator Manual review Machine Learning
  9. 9 How to create the data for ML Rule based

    Moderator Machine Learning Sell items Report items Hide items & Alert Positive Deleted items by Moderator Negative Not deleted items by Moderator Dataset Moderation Service Review
  10. 10 Task Design • Data is highly imbalanced • Each

    violated topic’s total number of alerts is bounded by moderator team All models trained as one-vs-all • No side-effect when deploying a trained model to other class • Hard to improve performance for each topic in a multi-class model Negative Violated Topic A Violated Topic N ... Positive Model A Model B ... counterfeits weapons
  11. 11 Multimodality of content Case of items Items have multimodal

    data • Image • Text • Category • Brand • Price, etc. We use multimodal model to improve model performance. See our article: https://tech.mercari.com/entry/2019/09/12/130000
  12. 12 Model selection based on dataset size • Gradient Boosted

    Decision Trees (GBDT) → Efficient for training and inference when training data size is not large *Image feature is not used in GBDT • Gated Multimodal Unit (GMU) → Potentially most accurate using multimodal data
  13. 13 Offline evaluation Metric is Precision@K: K is the bound

    on the daily total number of alerts in each violated topic decided by Moderators 2020-07-13 Current model’s prediction result In production Top K Evaluate new model against current model using the same item ids item ids same as production top K 2020-07-13 New model’s prediction result In test dataset. e.g.
  14. 14 Online evaluation → Faster decision making leads to efficient

    operation Current Model New Model Same traffic Moderator Manual review Each model alert number: K/2 Metrics: Precision@K/2 After a certain time after a new model is released, we decide which model should be deprecated based on the above metrics. Classic A/B testing can take several months. It was difficult to collect enough transactions for t-test.
  15. 15 Offline/online evaluation result Algorithms Offline Online GBDT +18.2% Not

    Released GMU +21.2% +23.2% Table shows the relative performance gain of offline evaluation metric is precision@K , online evaluation metric is precision@K/2 on one violated topic Baseline model is Logistic regression that was already released in production
  16. 16 Container based Training Pipeline Data Load Write manifest files

    containing requirements like CPU, GPU and Storage CPU CPU or GPU Training Offline Evaluation CPU BigQuery BigQuery
  17. 17 Serving system architecture Message queue Message queue proxy layer

    prediction layer . Preprocessing + inference Container Pod GBDT based model Preprocessing Container . . Proxy container subscribe publish Pod Inference Container Caffe2 Pod Deep Learning based model We manage over 15 Machine Learning models in production Pod Deep Learning based model
  18. 18 Horizontal Pod Autoscaler by kubernetes • Reliable system: Traffic

    changes with time, HPA can adopt to varying traffic • Cheaper billing cost: Reduce to 1/6 by HPA Billing cost transition after applying HPA Billing cost day Each color is each machine learning model
  19. 19 Impact of Machine Learning system Discovered 100 violating items

    Moderator Manual review Moderation Service Rule based Machine Learning Hide & Alert +Discovered 554 violating items Machine Learning system has increased coverage by 554% ↑ over rule based approach e.g.
  20. 20 If you have a question to this talk First

    author is Shunya UETA, please e-mail: hurutoriya@mercari.com Acknowledgements Co-Authors: Suganprabu Nagarajan, Mizuki Sango Contributter: • Abhishek Vilas Munagekar, Yusuke Shido, Vamshi Teja Racha, Sumit Verma and Keisuke Umezawa for their contribute to this system • Dr. Antony for his feedback about the paper • Yushi Kurita, Yuki Ito as Product Manager, All Trust and Safety project member and all Customer Service as Moderator to success this project. Question and Thanks collaborator