Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Merpay Tech Fest 2021_Vertex PipelinesとFeature Storeを活用した不正防止システム / Using Feature Store and Vertex Pipelines in Fraud Prevention System

Merpay Tech Fest 2021_Vertex PipelinesとFeature Storeを活用した不正防止システム / Using Feature Store and Vertex Pipelines in Fraud Prevention System

Merpay Tech Fest 2021は5日間のオンライン技術カンファレンスです。

IT企業で働くソフトウェアエンジニアおよびメルペイの技術スタックに興味がある方々を対象に2021年7月26日(月)から7月30日(金)までの5日間、開催します。 Merpay Tech Festは事業との関わりから技術への興味を深め、プロダクトやサービスを支えるエンジニアリングを知れるお祭りです。 セッションでは事業を支える組織・技術・課題などへの試行錯誤やアプローチを紹介予定です。お楽しみに!

■イベント関連情報
- 公式ウェブサイト:https://events.merpay.com/techfest-2021/
- 申し込みページ:https://mercari.connpass.com/event/215035/
- Twitterハッシュタグ: #MerpayTechFest

■リンク集
- メルカリ・メルペイイベント一覧:https://mercari.connpass.com/
- メルカリキャリアサイト:https://careers.mercari.com/
- メルカリエンジニアリングブログ:https://engineering.mercari.com/blog/
- メルカリエンジニア向けTwitterアカウント:https://twitter.com/mercaridevjp
- 株式会社メルペイ:https://jp.merpay.com/

92cdcff298e89e2fcd2fb705155c2d4b?s=128

mercari
PRO

July 28, 2021
Tweet

Transcript

  1. #MerpayTechFest Session Title Using Feature Store and Vertex Pipelines in

    Fraud Prevention System Liu Songjie Software Engineer (Machine Learning)
  2. #MerpayTechFest Software Engineer (Machine Learning) Liu Songjie I joined Merpay

    as a new graduate in 2019. I have been working on machine learning solutions for fraud prevention such as chargeback detecting. Recently, I've been involved in the development of Vertex AI based ML pipelines.
  3. #MerpayTechFest Overview Background Feature Store Training-Deploying Pipeline Summary 02 03

    04 01
  4. #MerpayTechFest Background

  5. #MerpayTechFest Fraud Prevention System • Multiple fraud detection related solutions

    in the system ◦ Alert Filtering (multiple ML models) ▪ https://engineering.mercari.com/blog/entry/alertfiltering-ml/ ◦ ChargeBack Detection (ML model) ▪ https://engineering.mercari.com/en/blog/entry/chargeback-ml/ ◦ Sub Account Detection ◦ (New) Suspicious Account Detection (rule-based logics) ◦ (New) Suspicious Action Detection (complex network) ▪ https://engineering.mercari.com/blog/entry/complex-network-ml/
  6. #MerpayTechFest Legacy Fraud Prevention System

  7. #MerpayTechFest Features t 2020 2021 Alert Filtering Model ChargeBack Detection

    Sub Account Detection Dimension ~ 40 Alert Filtering Model × 4 sub model ChargeBack Detection latest version Sub Account Detection Suspicious Account Detection × ~ 4 detection Suspicious Action Detection × 2 mass fraud detection Dimension ~ 170 The total dimension of features increased by 4 times within one year
  8. #MerpayTechFest Features • Various data sources ◦ BigQuery, Spanner, GCS,…

    • Different data sources between training and predicting ◦ Some of the models need to predict in real-time, so they need to use Spanner in predicting for lower latency • The same or similar features are created multiple times for different models
  9. #MerpayTechFest Features

  10. #MerpayTechFest Features • Cost of maintenance ◦ Large amount of

    the features become difficult to maintain • Cost of computing resource ◦ The same features created multiple times for different models
  11. #MerpayTechFest Features Features in Model B Features in Model A

    Features only in Model A Features only in Model B Features in both Model A & B Obviously, we can reduce costs by reusing the features of different models
  12. #MerpayTechFest Legacy Architecture Every model has its own pipeline and

    works separately
  13. #MerpayTechFest Legacy Architecture Training set BigQuery Data Sources Training AI

    Platform Jobs Serving AI Platform Models Outputs Predicting Input
  14. #MerpayTechFest Legacy Architecture Training set BigQuery Data Sources Training AI

    Platform Jobs Serving AI Platform Models Outputs Predicting Input
  15. #MerpayTechFest Legacy Architecture • A common training-deploying process ◦ Training

    sets are saved in BigQuery ◦ Models are trained by AI Platform Training Jobs ◦ Models are deployed on AI Platform Models • Difference between each solution ◦ Predicting inputs ▪ BigQuery, Spanner, .. ◦ Predicting outputs ▪ Connect to different microservices or systems
  16. #MerpayTechFest Problems & Needs • About Features ◦ Unnecessary costs

    on creating and maintaining features ◦ Necessity of improving the data reliability • About Legacy Architecture ◦ Necessity of a shared training-deploying pipeline
  17. #MerpayTechFest The new system architecture A feature store for all

    models A common pipeline for training-deploying process 02 01
  18. #MerpayTechFest The new system architecture

  19. #MerpayTechFest Feature Store

  20. #MerpayTechFest Feature Store

  21. #MerpayTechFest Feature Store (Before)

  22. #MerpayTechFest Feature Store (After)

  23. #MerpayTechFest What is a Feature Store ref: https://feast.dev/blog/what-is-a-feature-store/

  24. #MerpayTechFest Feature Store Candidates Vertex Feature Store & Feast Vertex

    Feature Store Feast Stream Ingesting No Yes GCP Support Native Good Cost High Low We choose to use Feast for now
  25. #MerpayTechFest Preparation • Define FeatureView (Schema) ◦ entities ◦ features

    ◦ input ◦ ...
  26. #MerpayTechFest Preparation • Run `feast apply`

  27. #MerpayTechFest Preparation • Then, access the feature store by SDK

  28. #MerpayTechFest Features in Bigquery (Offline firestore) timestamp entity feature •

    Entity • Feature • Timestamp
  29. #MerpayTechFest Data Stream

  30. #MerpayTechFest Data Stream

  31. #MerpayTechFest Training-Deploying Pipeline

  32. #MerpayTechFest Training-Deploying Pipeline

  33. #MerpayTechFest Vertex Pipelines

  34. #MerpayTechFest Vertex Pipelines • Based on Kubeflow Pipelines SDK or

    TensorFlow Extended ◦ Kubeflow Pipelines SDK is used in our system • Easier to get start than Kubeflow Pipelines ◦ Kubeflow Pipelines must be deployed on a Kubernetes Cluster. ▪ Lots of permission issues ▪ Hard to start and maintain ◦ Vertex Pipelines ▪ Provision all resources ▪ Store all the artifacts ▪ Pass resources, artifacts through each steps.
  35. #MerpayTechFest Vertex Pipelines

  36. #MerpayTechFest Get-data • Get data from feast offline store and

    save into Vertex AI Dataset. • Run a python script in a custom docker container with feast SDK installed
  37. #MerpayTechFest Train-model • Train model in a Docker container by

    Vertex AI custom container training job • Prepare the necessary libraries in the docker image • Custom the training process by python script in the container
  38. #MerpayTechFest Deploy-model • Deploy the model to the endpoint and

    start serving • Access endpoint using curl
  39. #MerpayTechFest Pipelines

  40. #MerpayTechFest Summary

  41. #MerpayTechFest Summary • The feature store and common training-predicting pipeline

    bring following benefits ◦ ML engineers can focus more on model development and improvement ◦ Increased development efficiency ◦ Smoother onboarding and collaboration • This architecture is still a tentative plan, we will keep improving it in the future
  42. #MerpayTechFest Future Work • A common pipeline for the prediction

    process • Store prediction results / graph features in the feature store for analysis and modeling • Continuous monitoring and quality control • Massively producing supervised models
  43. #MerpayTechFest Thank you for listening!