Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Merpay Tech Fest 2021_Vertex PipelinesとFeature ...

Merpay Tech Fest 2021_Vertex PipelinesとFeature Storeを活用した不正防止システム / Using Feature Store and Vertex Pipelines in Fraud Prevention System

Merpay Tech Fest 2021は5日間のオンライン技術カンファレンスです。

IT企業で働くソフトウェアエンジニアおよびメルペイの技術スタックに興味がある方々を対象に2021年7月26日(月)から7月30日(金)までの5日間、開催します。 Merpay Tech Festは事業との関わりから技術への興味を深め、プロダクトやサービスを支えるエンジニアリングを知れるお祭りです。 セッションでは事業を支える組織・技術・課題などへの試行錯誤やアプローチを紹介予定です。お楽しみに!

■イベント関連情報
- 公式ウェブサイト:https://events.merpay.com/techfest-2021/
- 申し込みページ:https://mercari.connpass.com/event/215035/
- Twitterハッシュタグ: #MerpayTechFest

■リンク集
- メルカリ・メルペイイベント一覧:https://mercari.connpass.com/
- メルカリキャリアサイト:https://careers.mercari.com/
- メルカリエンジニアリングブログ:https://engineering.mercari.com/blog/
- メルカリエンジニア向けTwitterアカウント:https://twitter.com/mercaridevjp
- 株式会社メルペイ:https://jp.merpay.com/

mercari

July 28, 2021
Tweet

More Decks by mercari

Other Decks in Technology

Transcript

  1. #MerpayTechFest Session Title Using Feature Store and Vertex Pipelines in

    Fraud Prevention System Liu Songjie Software Engineer (Machine Learning)
  2. #MerpayTechFest Software Engineer (Machine Learning) Liu Songjie I joined Merpay

    as a new graduate in 2019. I have been working on machine learning solutions for fraud prevention such as chargeback detecting. Recently, I've been involved in the development of Vertex AI based ML pipelines.
  3. #MerpayTechFest Fraud Prevention System • Multiple fraud detection related solutions

    in the system ◦ Alert Filtering (multiple ML models) ▪ https://engineering.mercari.com/blog/entry/alertfiltering-ml/ ◦ ChargeBack Detection (ML model) ▪ https://engineering.mercari.com/en/blog/entry/chargeback-ml/ ◦ Sub Account Detection ◦ (New) Suspicious Account Detection (rule-based logics) ◦ (New) Suspicious Action Detection (complex network) ▪ https://engineering.mercari.com/blog/entry/complex-network-ml/
  4. #MerpayTechFest Features t 2020 2021 Alert Filtering Model ChargeBack Detection

    Sub Account Detection Dimension ~ 40 Alert Filtering Model × 4 sub model ChargeBack Detection latest version Sub Account Detection Suspicious Account Detection × ~ 4 detection Suspicious Action Detection × 2 mass fraud detection Dimension ~ 170 The total dimension of features increased by 4 times within one year
  5. #MerpayTechFest Features • Various data sources ◦ BigQuery, Spanner, GCS,…

    • Different data sources between training and predicting ◦ Some of the models need to predict in real-time, so they need to use Spanner in predicting for lower latency • The same or similar features are created multiple times for different models
  6. #MerpayTechFest Features • Cost of maintenance ◦ Large amount of

    the features become difficult to maintain • Cost of computing resource ◦ The same features created multiple times for different models
  7. #MerpayTechFest Features Features in Model B Features in Model A

    Features only in Model A Features only in Model B Features in both Model A & B Obviously, we can reduce costs by reusing the features of different models
  8. #MerpayTechFest Legacy Architecture Training set BigQuery Data Sources Training AI

    Platform Jobs Serving AI Platform Models Outputs Predicting Input
  9. #MerpayTechFest Legacy Architecture Training set BigQuery Data Sources Training AI

    Platform Jobs Serving AI Platform Models Outputs Predicting Input
  10. #MerpayTechFest Legacy Architecture • A common training-deploying process ◦ Training

    sets are saved in BigQuery ◦ Models are trained by AI Platform Training Jobs ◦ Models are deployed on AI Platform Models • Difference between each solution ◦ Predicting inputs ▪ BigQuery, Spanner, .. ◦ Predicting outputs ▪ Connect to different microservices or systems
  11. #MerpayTechFest Problems & Needs • About Features ◦ Unnecessary costs

    on creating and maintaining features ◦ Necessity of improving the data reliability • About Legacy Architecture ◦ Necessity of a shared training-deploying pipeline
  12. #MerpayTechFest The new system architecture A feature store for all

    models A common pipeline for training-deploying process 02 01
  13. #MerpayTechFest Feature Store Candidates Vertex Feature Store & Feast Vertex

    Feature Store Feast Stream Ingesting No Yes GCP Support Native Good Cost High Low We choose to use Feast for now
  14. #MerpayTechFest Vertex Pipelines • Based on Kubeflow Pipelines SDK or

    TensorFlow Extended ◦ Kubeflow Pipelines SDK is used in our system • Easier to get start than Kubeflow Pipelines ◦ Kubeflow Pipelines must be deployed on a Kubernetes Cluster. ▪ Lots of permission issues ▪ Hard to start and maintain ◦ Vertex Pipelines ▪ Provision all resources ▪ Store all the artifacts ▪ Pass resources, artifacts through each steps.
  15. #MerpayTechFest Get-data • Get data from feast offline store and

    save into Vertex AI Dataset. • Run a python script in a custom docker container with feast SDK installed
  16. #MerpayTechFest Train-model • Train model in a Docker container by

    Vertex AI custom container training job • Prepare the necessary libraries in the docker image • Custom the training process by python script in the container
  17. #MerpayTechFest Deploy-model • Deploy the model to the endpoint and

    start serving • Access endpoint using curl
  18. #MerpayTechFest Summary • The feature store and common training-predicting pipeline

    bring following benefits ◦ ML engineers can focus more on model development and improvement ◦ Increased development efficiency ◦ Smoother onboarding and collaboration • This architecture is still a tentative plan, we will keep improving it in the future
  19. #MerpayTechFest Future Work • A common pipeline for the prediction

    process • Store prediction results / graph features in the feature store for analysis and modeling • Continuous monitoring and quality control • Massively producing supervised models