Slide 1

Slide 1 text

Microservices-frendly Data Pipeline merpay DataPlatform @syucream

Slide 2

Slide 2 text

Microservices in merpay ServiceA ServiceB ServiceC App App App

Slide 3

Slide 3 text

Data sources vs usecases D-service B-service A-service C-service KPI Analytics Fraud Detection Credit Scoring Funnel Analytics ML system Customer Support

Slide 4

Slide 4 text

microservice -A microservice -B microservice -C Data Pipeline datauser -A datauser-B BigQuery BigQuery Event Log Event Log Event Log DB DB DB batch transfer stream transfer batch transfer batch transfer stream transfer stream transfer BqLoad Tool Cloud Storage ToGCS Tool? Cloud Dataflow BqLoad Tool Publish message Subscribe message merpay DataPipeline

Slide 5

Slide 5 text

Batch (prototype) microservice-B merpay-dataplatform Data User - A microservice-A Cloud Pub/Sub BigQuery Cloud Functions data mart change notification Pub/Sub trigger (BqLoad path) BqLoad Cloud SQL data lake Cloud Storage Cloud Spanner microservice-C Cloud Datastore Data Pipeline

Slide 6

Slide 6 text

Stream (prototype) Microservice platform team Kubernetes cluster A-service B-service merpay-dataplatform Logging Cloud Pub/Sub Cloud Dataflow BigQuery stdout via logging library Sink to Pub/Sub Subscribe Data User - A BigQuery DWH Streaming Insert

Slide 7

Slide 7 text

Schema Registory ● Pre-define log schema in ProtocolBuffer ○ It’s popular in mercari/merpay ○ There’s are some useful protoc plugins ● Manage .proto files on GitHub

Slide 8

Slide 8 text

Remaining issues ● Batch ○ Scalability ● Stream ○ Reliability of google-fluentd DaemonSet ● Schema Registory ○ Who does defining/reviewing the schemas?