Platform Data Platform Transfer SRE / Infrastructure Team SRE / Infra Team Data Team Product Team Product Team Data Team Direction of Dependence? Separation of Responsibility?
Platform Batch Transfer • Transfer data in periodically • For example, data are all sent and full replaced in data platform • Or, data are sent in chunk periodically
Batch Transfer 13 Snapshot Transfer • Transfer full data via Database snapshot Query Transfer • Capture changes using SQL via JDBC connection • Use ETL Service or Broker Service
Platform Streaming Transfer • Transfer data in streaming • Data in data platform are updated in near real-time • Generally, Change data capture (CDC) is used
Database Engine in recent version, it is necessary to gather replication logs from writer instance • In addition, database user or settings for replication logs should be configured • So, boundary between both platform is ambiguous 19
Streaming Transfer • Transfer data using both of batch and streaming • Benefit from both the certainty data of batch transfers and the near real-time data of streaming transfers • Use in carefully because of its complexity Batch Transfer
Platform • Manual Table Maintenance ◦ In general, operating with manual failover or service maintenance • Suddenly Failover ◦ Hard to predict • Database Upgrade Operation ◦ Recently, operating with Aurora Blue/Green Deployment is recommended to avoid service down. 22
Platform • Manual Table Maintenance ◦ In general, operating with manual failover or service maintenance • Suddenly Failover ◦ Hard to predict • Database Upgrade Operation ◦ Recently, operating with Aurora Blue/Green Deployment is recommended to avoid service down. 23 Dive Deep into this pattern
to Redshift serverless 24 Amazon Redshift Serverless binlog Writer instance AWS DMS anecdote: Oh no, using DMS Serverless transferring to Redshift Serverless is not supported!! v5.7 Operation Case: We’d like to upgrade Database Engine version to MySQL 8.0 using Aurora Blue / Green Deployment
team and data team are different, it is necessary to address those problems during operation • Direct Communication, maintain runbook, and so on Direction of Dependence and Separation of Responsibility • If team collaboration is difficult, it is important to design well direction of dependence and separation of responsibility 28
• Choose simple data platform and technology • Operate both platform by the same team 30 Amazon Aurora Cluster Writer instance Reader instance Amazon Athena Amazon S3 Cluster Snapshot (parquet file) Product and Data Platform Team
• If data freshness is important, create CDC platform using simple managed, serverless services • Collaborate with product and data team 31 Amazon Redshift Serverless replication log AWS DMS Amazon Aurora Cluster Writer instance Reader instance Product Platform Team Data Platform Team Dependence
size • Design dependency direction that data platform refers to the product platform 32 Amazon Redshift Serverless Amazon Aurora Cluster Writer instance Reader instance Amazon MSK JDBC Connector Product Platform Team Data Platform Team Dependence SQL
of product • Size and capability of team • Technology stack that can be adopt It is important to; • design the team and the platform together • be aware of the separation of responsibilities and the direction of dependencies. 34