Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AWS Data Migration Service (DMS)

AWS Data Migration Service (DMS)

AWS Data Migration Service (DMS) - Amazon’s Approach To Change Data Capture (CDC)

Avatar for Ashik Uzzaman

Ashik Uzzaman

January 19, 2019
Tweet

More Decks by Ashik Uzzaman

Other Decks in Technology

Transcript

  1. AWS Data Migration Service (DMS) Amazon’s Approach To Change Data

    Capture (CDC) Presented By Ashik Uzzaman Sr Software Engineer, Roku Hosted By Java User Group Bangladesh (JUGBD) Date: 19th January 2019
  2. Change Data Capture (CDC) • Change data capture (CDC) is

    an approach to data integration that is based on the identification, capture and delivery of the changes made to enterprise data sources. • Insert, Update or Delete operations on tables. • CDC solutions occur most often in data-warehouse environments to capture and preserve the state of data across time. • Data migration projects.
  3. 2 Popular Ways to Track Changes • Tracking changes using

    Database Triggers - May include a publish/subscribe pattern to communicate the changed data to multiple targets. In this approach, triggers log events that happen to the transactional table into another queue table that can later be "played back". • Reading the transaction log as, or shortly after, it is written - Most database management systems manage a transaction log that records changes made to the database contents and to metadata. By scanning and interpreting the contents of the database transaction log one can capture the changes made to the database in a non-intrusive manner.
  4. Data Migration Service (DMS) AWS Database Migration Service (AWS DMS)

    is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud, between on-premises instances (through an AWS Cloud setup), or between combinations of cloud and on-premises setups. With support for all major relational and NoSQL databases, DMS is a fitting choice for many data migration projects. In this seminar, we will touch base on Amazon’s choice of designing DMS based on database transaction logs as well as how you can use Terraform for DMS.
  5. Data Migration Service (DMS) • Homogenous and heterogenous migrations •

    Uses log scanners for databases (transaction log) • Continuous replication • Replication happens table by table • No indexes, triggers, foreign key constraints for full load • Shards Consolidation to target database cluster • Replication Instance • Selection and Transformation Rules for DMS tasks • Source and Target Endpoints • 5 TB, upto 15 TB successful - each TB 12 hours to replicate • Sysadmin privilege to the dmsuser • Either source or destination endpoint in AWS
  6. Why DMS? • Near zero downtime migration • Secure •

    Easy to support migration from shards • Allows DB freedom (any supported source to target) • Cost effective • Repeatable
  7. Schema Conversion Tool (SCT) • The AWS Schema Conversion Tool

    makes heterogeneous database migrations predictable by automatically converting the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database. • Any objects that cannot be automatically converted are clearly marked so that they can be manually converted to complete the migration.
  8. Flyway • Flyway is an open source tool that makes

    data migration easy. • Think of Flyway as version control for your database. • It lets you evolve your database schema easily and reliably across all your instances.
  9. Challenges for Transaction Log • Coordinating the reading of the

    transaction logs and the archiving of log files (database management software typically archives log files off-line on a regular basis). • Translation between physical storage formats that are recorded in the transaction logs and the logical formats typically expected by database users (e.g., some transaction logs save only minimal buffer differences that are not directly useful for change consumers). • Dealing with changes to the format of the transaction logs between versions of the database management system. • Eliminating uncommitted changes that the database wrote to the transaction log and later rolled back. • Dealing with changes to the metadata of tables in the database. • No SLA from Amazon for data replication max time.
  10. Advantages for Transaction Log • Minimal impact on the database

    (even more so if one uses log shipping to process the logs on a dedicated host). • No need for programmatic changes to the applications that use the database. • Low latency in acquiring changes. • Transactional integrity: log scanning can produce a change stream that replays the original transactions in the order they were committed. Such a change stream include changes made to all tables participating in the captured transaction. • No need to change the database schema
  11. DMS Good Practices • Share common source endpoints and target

    endpoints across multiple teams and projects. • Share DMS instances across multiple teams and projects unless you have some resource hungry DMS tasks. • Decide if full load, ongoing replication or full load with ongoing replication is the correct setup for your dms tasks. Unless your use case is very straight forward, it pays to split your data migration task into two separate tasks - one for full load and another for ongoing replication only. • Decide if you want to use DROP_AND_CREATE_TABLE or TRUNCATE_BEFORE_LOAD or DO_NOTHING for your full load dms tasks. For sharded source databases that will merge into a single destination database in Aurora, it's convenient to setup the first shard's task as DROP_AND_CREATE_TABLE or TRUNCATE_BEFORE_LOAD. But for the second shard, it should be DO_NOTHING. • During full load mode, it's better to disable validation and logging as it will significantly slow down the copy process. However, if you need it, consider using a very large DMS instance. • Once deployed, leave the DMS tasks in terraform to ignore changes on replication_task_settings as lifecycle events • Do lots of dry runs
  12. Terraform • Terraform is an infrastructure as code software by

    HashiCorp. It allows users to define a datacenter infrastructure in a high-level configuration language, from which it can create an execution plan to build the infrastructure such as OpenStack or in a service provider such as AWS, IBM Cloud (formerly Bluemix), Google Cloud Platform, Linode, Microsoft Azure, Oracle Cloud Infrastructure, or VMware vSphere. • Infrastructure is defined in a HCL Terraform syntax or JSON format. • Terraform supports most of DMS and RDS related operations.
  13. Acknowledgements • AWS DMS team • My colleagues at Roku

    • Bazlur Rahman Rokon • JUGBD members and today’s audience