Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to make DR-ready system with Amazon Aurora ...

How to make DR-ready system with Amazon Aurora Global Database

APJ Open Mic - May (in AWS Community Builders)

みのるん

May 11, 2023
Tweet

More Decks by みのるん

Other Decks in Technology

Transcript

  1. How to make DR-ready system with Amazon Aurora Global Database

    Minoru Onda @minorun365 So#ware engineer KDDI Corpora1on & KDDI Agile Development Center Corpora1on A W S C O M M U N I T Y B U I L D E R S A PJ O P E N M I C – M AY
  2. > Minoru Onda @minorun365 So,ware Engineer / KDDI & KAG

    (Concurrently) Co-lead in communiCes • JAWS-UG SRE Branch • JAWS DAYS 2022 Awards • KDDI Cloud SAMURAI 2021 • KDDI Cloud Ambassadors 2021 $ whoami
  3. • Distributed instances • Fast replicaCon in 3AZ storage •

    Autoscaling of replicas and volumes docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html Amazon Aurora is cloud-na3ve RDB
  4. Hosted on EKS and connecCong with many on-prem systems Where

    I am using Aurora VPC Tokyo region EKS on EC2 Aurora iPads on shops Private network Private network On-prem workloads DX DX
  5. We faced huge outage of DX on 2021, and started

    mulC-region planning Started DR planning a?er Direct Connect outage VPC Tokyo region EKS on EC2 Aurora iPads on shops Private network Private network On-prem workloads DX DX
  6. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Then, how to enable BCP?
  7. Clarify your requirements! • Recovery Cme objecCve (RTO) • Recovery

    point objecCve (RPO) Basic strategies of disaster recovery (DR) on AWS aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/
  8. Select best-suit plan with your requirement • Backup & restore

    • Pilot light • Warm standby • MulC-site acCve/acCve Basic strategies of disaster recovery (DR) on AWS aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/
  9. Select best-suit plan with your requirement • Backup & restore

    • Pilot light • Warm standby • MulC-site acCve/acCve We chose it! Basic strategies of disaster recovery (DR) on AWS aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/ ☝
  10. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Why database is difficult on mul:-region?
  11. When you have mulCple DB like... • To get High-Availability

    • To make DR-ready • To enable Blue/Green deployment “You build mul<-DB, you run it consistently!” Difficul3es on mul3ple database Werner Vogels on Wikipedia, however he never says above
  12. Popular soluCons for make DBs consistent: From applica+on • 2

    phase commit (2PC) • Saga pe8ern Difficul3es on mul3ple database From database • Logical replica+on • Phisical replica+on
  13. Popular soluCons for make DBs consistent: From applica+on • 2

    phase commit (2PC) • Saga pe8ern Difficul3es on mul3ple database From database • Logical replica+on • Phisical replica+on ☝ You can use it easily with Amazon Aurora!
  14. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What is Global Database?
  15. • Storage replicaCon across regions • High performance with phisical

    volume replicaCon Amazon Aurora Global Database Aurora cluster (primary) Tokyo region Writer & readers Cluster volumes Aurora cluster (secondary) Osaka region Readers Cluster volumes ‎ ‎ ‎ Outbound replica4on
  16. Supported engines: On Aurora MySQL • Ver 2.11+ (minor versions)

    Amazon Aurora Global Database On Aurora PostgreSQL • Ver 11.17+ (minor versions) • Ver 12.12+ (minor versions) • Ver 13.8+ (minor versions) • Ver 14.5+ (minor versions)
  17. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Demo 1: Enable Global Database
  18. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Two op:ons for failover Global Database
  19. We have 2 opCons for failover regions in Global Database

    Planned failover (managed) • One-click on console • Cannot used on emergency What should you do on disaster? Unplanned failover • Manual opera9on with steps • Available even on disaster
  20. We have 2 opCons for failover regions in Global Database

    Planned failover (managed) • One-click on console • Cannot used on emergency What should you do on disaster? Unplanned failover • Manual opera9on with steps • Available even on disaster 👇 Use it first!
  21. When you run Global Database on Tokyo (primary) and Osaka

    (secondary) ... Opera3on steps on disaster Aurora cluster (primary) Tokyo region Aurora cluster (secondary) Osaka region ‎ ‎ ‎ Primary
  22. Disaster occured! Opera3on steps on disaster Aurora cluster (primary) Tokyo

    region Osaka region ‎ ‎ ‎ Aurora cluster (secondary)
  23. Then you shoud remove secondary cluster from Global Database Opera3on

    steps on disaster Aurora cluster (primary) Tokyo region Aurora cluster (standalone) Osaka region Remove from Global DB
  24. A,er disaster past, you can rebuild Global Database from Osaka

    (new-primary) Opera3on steps a?er disaster Aurora cluster (old) Tokyo region Aurora cluster (primary) Osaka region Aurora cluster (secondary) ‏ ‏ ‏ Rebuild GDB Primary
  25. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Demo 2: Unplanned failover on emergency
  26. A,er disaster past, you can rebuild Global Database from Osaka

    (new-primary) Opera3on steps a?er disaster Aurora cluster (old) Tokyo region Aurora cluster (primary) Osaka region Aurora cluster (secondary) ‏ ‏ ‏ Rebuild GDB Primary
  27. On peaceful day, you can switch back regions with managed

    planned failover Opera3on steps a?er disaster Aurora cluster (old) Tokyo region Aurora cluster (secondary) Osaka region Aurora cluster (primary) ‎ ‎ ‎ Managed failover Primary
  28. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Demo 3: Planned failover (managed)
  29. © 2023, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. What I learned by using Global Database on produc:on
  30. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect
  31. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag
  32. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints
  33. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database
  34. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database Chack health of app on Osaka
  35. Make DR opera3on flowchart beforehand! Disaster occured! Decide to ac4vate

    DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database Chack health of app on Osaka within RTO
  36. In my case... • Automated all the operaCons with Exastro

    (Japanese so,ware) • Separeted operaCons by group, making it easy to go flexible with situaCon • Using Ansible for included operaCons of on-prem network components You will make mistake on emergency. Automate it!
  37. If you can, plan regular training with all the stakeholders

    • OperaCons team • Infrastructure developers (including DBA) • ApplicaCon developers • Management (who can decide to acCvate DR) And prac3ce DR opera3ons regularly!
  38. Thank you! Minoru Onda How to make DR-ready system with

    Amazon Aurora Global Database Give me your feedback!