Slide 1

Slide 1 text

Session Title Accelerating Fraud Detection Using Data on Fraud Risk of Potential Transactions Liu Songjie / Victoria Li Merpay Fraud prevention Team

Slide 2

Slide 2 text

Liu Songjie Joined Merpay in October of 2019 ● Fraud prevention Team ○ Machine Learning Engineer Merpay Machine Learning Engineer

Slide 3

Slide 3 text

Victoria Li Joined Yahoo! JAPAN in April of 2017 ● Software Engineer ● 2019/1~: Machine Learning Engineer Joined Merpay in September of 2021 ● Machine Learning Platform ● Feature Store ● GraphDB Merpay Machine Learning Engineer

Slide 4

Slide 4 text

Fraud Prevention Models ● Alert Filtering (multiple ML models) ○ ref: https://engineering.mercari.com/blog/entry/alertfiltering-ml/ ● ChargeBack Detection (ML model) ○ ref: https://engineering.mercari.com/en/blog/entry/chargeback-ml/ ● Suspicious Action Detection (complex network) ○ ref: https://engineering.mercari.com/blog/entry/complex-network-ml/ ● Potential Transaction Risk Detection ● (New) Graph Based Fraud Detection ● etc.

Slide 5

Slide 5 text

Past Events ● Merpay Tech Fest 2021 ○ Using Feature Store and Vertex Pipelines in Fraud Prevention System ■ ref: https://speakerdeck.com/mercari/using-feature-store-and-vertex-pipelines-in-fraud-prevention-system ● Merpay Tech Fest 2022 ○ Machine Learning infrastructure using Feature Store and Vertex AI ■ ref: https://speakerdeck.com/mercari/machine-learning-infrastructure-using-feature-store-and-vertex-ai ○ Graph Theory and Anti-Fraud Measures Unravel connections between data ■ ref: https://speakerdeck.com/mercari/graph-theory-and-anti-fraud-measures-unravel-connections-between-data

Slide 6

Slide 6 text

Content Background 01 Mechanism 02 System Architecture 03 Feature Store 04 04 05 Summary

Slide 7

Slide 7 text

Background 2 Types of Fraud Detection ● Afterward Detection ○ Stop the delivery of the product, after the fraudulent transaction has been made ○ Risk of missing out on fraud ● Immediate Detection ○ Take immediate action upon discovering a fraudulent transaction ○ Low latency requested ref: https://mercan.mercari.com/articles/31223/

Slide 8

Slide 8 text

Background Immediate Fraud Detection ● Judge service: An immediate fraud prevention system with low latency (0.1s) ○ provide by TnS backend team ● ML solutions: from daily batch to 1 minute latency ○ provide by ML team Judge Service: https://mercan.mercari.com/articles/31223/

Slide 9

Slide 9 text

Content Background 01 Mechanism 02 System Architecture 03 Feature Store 04 04 05 Summary

Slide 10

Slide 10 text

Mechanism Provide Potential Risk Rank, before transaction has been made

Slide 11

Slide 11 text

Mechanism Some features (attribute data) are already existed / updated before the transaction takes place: ● item information ● seller information ● buyer information ● … Features mentioned above help us evaluating the risk of a potential transaction before its completion

Slide 12

Slide 12 text

Mechanism

Slide 13

Slide 13 text

Mechanism Process ● ML solutions (ML model) calculate the risk of a potential transaction after the attributes data created/updated ● Potential risk sent to the Judge Service, and saved in database ● Judge Service assess the risk of transactions in real-time as they occur

Slide 14

Slide 14 text

Mechanism Batch prediction pain point

Slide 15

Slide 15 text

Mechanism Latency of ML solutions

Slide 16

Slide 16 text

Mechanism To accelerate the process, add a Streamlined solution to the existing Batch solution, utilizing: ● Feast Online Store ● Stream prediction system

Slide 17

Slide 17 text

Content Background 01 Mechanism 02 System Architecture 03 Feature Store 04 04 05 Summary

Slide 18

Slide 18 text

System Architecture

Slide 19

Slide 19 text

System Architecture 4 Parts ● Feast (Feature Store) ○ Introduce in the next part ● Batch system ○ Using Vertex Pipelines ● Stream system ○ Microservices ● Publish API ○ A service for publishing results to Judge Service

Slide 20

Slide 20 text

Batch System

Slide 21

Slide 21 text

Batch System ● Cloud Scheduler ○ daily / hourly ● Pub/Sub ● Cloud Run ○ Vertex Pipelines Executor ● Vertex Pipelines ● Spanner ● BigQuery

Slide 22

Slide 22 text

Stream System

Slide 23

Slide 23 text

Stream System

Slide 24

Slide 24 text

Stream System

Slide 25

Slide 25 text

Publish API

Slide 26

Slide 26 text

Publish API Outbox Table CREATE TABLE outbox ( id STRING(64) NOT NULL, information_1 INT64 NOT NULL, information_2 STRING(64) NOT NULL, …, pubsub_topic STRING(64) NOT NULL, created TIMESTAMP NOT NULL OPTIONS ( allow_commit_timestamp = true ), updated TIMESTAMP NOT NULL OPTIONS ( allow_commit_timestamp = true ), is_published BOOL NOT NULL, schedule TIMESTAMP, ) PRIMARY KEY(id); Information used in Pub/Sub message Pub/Sub topic Is this record published Don’t publish until this time

Slide 27

Slide 27 text

Publish API ● Deployed to Cloud Run ● Publish messages to Judge Service ● Triggered by ○ Cloud Scheduler (every minute) for batch system ○ Pub/Sub for stream system

Slide 28

Slide 28 text

Publish API

Slide 29

Slide 29 text

Content Background 01 Mechanism 02 System Architecture 03 Feature Store 04 04 05 Summary

Slide 30

Slide 30 text

System Architecture

Slide 31

Slide 31 text

Feature Store In addition to prediction, serving features at near real-time and low latency is also crucial To address this, we have implemented Feast Online Store, a powerful tool that allows us to store / serve features

Slide 32

Slide 32 text

Life Before a Feature Store https://www.hopsworks.ai/post/feature-store-the-missing-data-layer-in-ml-pipelines

Slide 33

Slide 33 text

Life After a Feature Store https://www.hopsworks.ai/post/feature-store-the-missing-data-layer-in-ml-pipelines

Slide 34

Slide 34 text

Feature Store Architecture Online Store and Offline Store Offline Store Online Store Moderate to high latency Low latency Store historical features for each entity Store only the latest features for each entity Mainly used to train model and batch prediction Mainly used to serve model

Slide 35

Slide 35 text

Feature Store Architecture

Slide 36

Slide 36 text

Feature Store Which features should be materialized? Which features should be ingested by stream? ● stream ingestion ○ real-time features ● materialization ○ not real-time features (aggregated features etc.) ■ window features etc. ■ https://engineering.mercari.com/en/blog/entry/chargeback-ml/

Slide 37

Slide 37 text

Scheduled/On-demand materialization

Slide 38

Slide 38 text

FeatureView between Online/Offline Store https://speakerdeck.com/mercari/machine-learning-infrastructure-using-feature-store-and-vertex-ai?slide=12 FeatureView1 FeatureView2 FeatureView3 Feature Service1 Feature Service2 Feature Service3 FeatureView4 … … Model A Training Model B Training Model C Training … BigQuery GCS FEAST

Slide 39

Slide 39 text

How to manage FeatureView between Online/Offline Store ● Let’s say there’s a FeatureView M at the offline store, have the features below: ○ Feature A (realtime feature) ○ Feature B (not realtime feature) ● Need to choose stream ingest or materialize to ingest data to the online store. FeatureView between Online/Offline Store

Slide 40

Slide 40 text

Here comes the problem: ● Feature A (realtime feature) ○ could be ingested from Kafka (stream) ● Feature B (not realtime feature) ○ could be ingested by materialization Overwriting can occur in the same feature view, due to simultaneous ingestion. FeatureView between Online/Offline Store

Slide 41

Slide 41 text

FeatureView between Online/Offline Store

Slide 42

Slide 42 text

FeatureView between Online/Offline Store

Slide 43

Slide 43 text

Solution ● Create FeatureView for each stream ingest and materialization We’ll have 3 feature views: ● Original FeatureView ○ include all features ● FeatureView for materialization ○ only contain features which could be ingested by materialization ● FeatureView for stream ingest ○ only contain features which could be stream ingested FeatureView between Online/Offline Store

Slide 44

Slide 44 text

Content Background 01 Mechanism 02 System Architecture 03 Feature Store 04 04 05 Summary

Slide 45

Slide 45 text

New mechanism for enhancing fraud detection speed Calculate & Provide Potential Risk Rank before transaction has been made, to immediate fraud prevention system

Slide 46

Slide 46 text

Implementation of the mechanism Adding stream system to existing batch system, we have taken one step closer to a comprehensive fraud prevention solution.

Slide 47

Slide 47 text

Coming soon ● Graph database in fraud prevention system! ● Graph-based approaches demonstrate remarkable effectiveness in detecting fraud. Please stay tuned for upcoming blogs post for more details!

Slide 48

Slide 48 text

No content