Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to make DR-ready system with Amazon Aurora Global Database

How to make DR-ready system with Amazon Aurora Global Database

APJ Open Mic - May (in AWS Community Builders)

みのるん

May 11, 2023
Tweet

More Decks by みのるん

Other Decks in Technology

Transcript

  1. How to make DR-ready system
    with Amazon Aurora Global Database
    Minoru Onda @minorun365
    So#ware engineer
    KDDI Corpora1on & KDDI Agile Development Center Corpora1on
    A W S C O M M U N I T Y B U I L D E R S
    A PJ O P E N M I C – M AY

    View Slide

  2. > Minoru Onda @minorun365
    So,ware Engineer / KDDI & KAG (Concurrently)
    Co-lead in communiCes
    • JAWS-UG SRE Branch
    • JAWS DAYS 2022
    Awards
    • KDDI Cloud SAMURAI 2021
    • KDDI Cloud Ambassadors 2021
    $ whoami

    View Slide

  3. Do you love Amazon Aurora?

    View Slide

  4. Off course I love!
    😍

    View Slide

  5. • Distributed instances
    • Fast replicaCon in 3AZ storage
    • Autoscaling of replicas and volumes
    docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Overview.html
    Amazon Aurora is cloud-na3ve RDB

    View Slide

  6. Customer service applicaCon in mobile shops
    Where I am using Aurora

    View Slide

  7. Hosted on EKS and connecCong with many on-prem systems
    Where I am using Aurora
    VPC
    Tokyo region
    EKS
    on EC2 Aurora
    iPads on shops
    Private network
    Private network
    On-prem workloads
    DX
    DX

    View Slide

  8. We faced huge outage of DX on 2021, and started mulC-region planning
    Started DR planning a?er Direct Connect outage
    VPC
    Tokyo region
    EKS
    on EC2 Aurora
    iPads on shops
    Private network
    Private network
    On-prem workloads
    DX
    DX

    View Slide

  9. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Then, how to enable BCP?

    View Slide

  10. Clarify your requirements!
    • Recovery Cme objecCve (RTO)
    • Recovery point objecCve (RPO)
    Basic strategies of disaster recovery (DR) on AWS
    aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/

    View Slide

  11. Select best-suit plan with your requirement
    • Backup & restore
    • Pilot light
    • Warm standby
    • MulC-site acCve/acCve
    Basic strategies of disaster recovery (DR) on AWS
    aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/

    View Slide

  12. Select best-suit plan with your requirement
    • Backup & restore
    • Pilot light
    • Warm standby
    • MulC-site acCve/acCve
    We chose it!
    Basic strategies of disaster recovery (DR) on AWS
    aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/

    View Slide

  13. And don’t worry!
    It’s easy to make your system
    mul/-region 😚

    View Slide

  14. And don’t worry!
    It’s easy to make your system
    mul/-region 😚
    ...excluding Database 🥶

    View Slide

  15. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Why database is difficult on mul:-region?

    View Slide

  16. When you have mulCple DB like...
    • To get High-Availability
    • To make DR-ready
    • To enable Blue/Green deployment
    “You build mul<-DB,
    you run it consistently!”
    Difficul3es on mul3ple database
    Werner Vogels on Wikipedia,
    however he never says above

    View Slide

  17. Popular soluCons for make DBs consistent:
    From applica+on
    • 2 phase commit (2PC)
    • Saga pe8ern
    Difficul3es on mul3ple database
    From database
    • Logical replica+on
    • Phisical replica+on

    View Slide

  18. Popular soluCons for make DBs consistent:
    From applica+on
    • 2 phase commit (2PC)
    • Saga pe8ern
    Difficul3es on mul3ple database
    From database
    • Logical replica+on
    • Phisical replica+on

    You can use it easily
    with Amazon Aurora!

    View Slide

  19. Yes that’s Global Database

    View Slide

  20. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    What is Global Database?

    View Slide

  21. • Storage replicaCon across regions
    • High performance with phisical volume replicaCon
    Amazon Aurora Global Database
    Aurora cluster (primary)
    Tokyo region
    Writer & readers
    Cluster volumes
    Aurora cluster (secondary)
    Osaka region
    Readers
    Cluster volumes
    ‎ ‎ ‎
    Outbound
    replica4on

    View Slide

  22. Supported engines:
    On Aurora MySQL
    • Ver 2.11+ (minor versions)
    Amazon Aurora Global Database
    On Aurora PostgreSQL
    • Ver 11.17+ (minor versions)
    • Ver 12.12+ (minor versions)
    • Ver 13.8+ (minor versions)
    • Ver 14.5+ (minor versions)

    View Slide

  23. Just click “Add AWS Region” on exisCng cluster, that’s it 👍
    How to use Global Database

    View Slide

  24. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Demo 1: Enable Global Database

    View Slide

  25. Conguratula*ons 🎉
    It’s global now.
    ...is that all??
    🥳

    View Slide

  26. No, it’s just a beginning...
    What you shoud do
    on actual disaster maBers!

    View Slide

  27. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Two op:ons for failover Global Database

    View Slide

  28. We have 2 opCons for failover regions in Global Database
    Planned failover (managed)
    • One-click on console
    • Cannot used on emergency
    What should you do on disaster?
    Unplanned failover
    • Manual opera9on with steps
    • Available even on disaster

    View Slide

  29. We have 2 opCons for failover regions in Global Database
    Planned failover (managed)
    • One-click on console
    • Cannot used on emergency
    What should you do on disaster?
    Unplanned failover
    • Manual opera9on with steps
    • Available even on disaster
    👇
    Use it first!

    View Slide

  30. When you run Global Database on Tokyo (primary) and Osaka (secondary) ...
    Opera3on steps on disaster
    Aurora cluster (primary)
    Tokyo region
    Aurora cluster (secondary)
    Osaka region
    ‎ ‎ ‎
    Primary

    View Slide

  31. Disaster occured!
    Opera3on steps on disaster
    Aurora cluster (primary)
    Tokyo region Osaka region
    ‎ ‎ ‎ Aurora cluster (secondary)

    View Slide

  32. Then you shoud remove secondary cluster from Global Database
    Opera3on steps on disaster
    Aurora cluster (primary)
    Tokyo region
    Aurora cluster (standalone)
    Osaka region
    Remove from
    Global DB

    View Slide

  33. A,er disaster past, you can rebuild Global Database from Osaka (new-primary)
    Opera3on steps a?er disaster
    Aurora cluster (old)
    Tokyo region
    Aurora cluster (primary)
    Osaka region
    Aurora cluster (secondary)
    ‏ ‏ ‏
    Rebuild GDB
    Primary

    View Slide

  34. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Demo 2: Unplanned failover on emergency

    View Slide

  35. A,er disaster past, you can rebuild Global Database from Osaka (new-primary)
    Opera3on steps a?er disaster
    Aurora cluster (old)
    Tokyo region
    Aurora cluster (primary)
    Osaka region
    Aurora cluster (secondary)
    ‏ ‏ ‏
    Rebuild GDB
    Primary

    View Slide

  36. On peaceful day, you can switch back regions with managed planned failover
    Opera3on steps a?er disaster
    Aurora cluster (old)
    Tokyo region
    Aurora cluster (secondary)
    Osaka region
    Aurora cluster (primary)
    ‎ ‎ ‎
    Managed
    failover
    Primary

    View Slide

  37. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    Demo 3: Planned failover (managed)

    View Slide

  38. © 2023, Amazon Web Services, Inc. or its affiliates. All rights reserved.
    What I learned by using Global Database
    on produc:on

    View Slide

  39. Make DR opera3on flowchart beforehand!
    Disaster occured!

    View Slide

  40. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR

    View Slide

  41. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL

    View Slide

  42. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect

    View Slide

  43. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect
    Check replica4on lag

    View Slide

  44. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect
    Check replica4on lag
    Modify DNS record to
    switch Aurora endpoints

    View Slide

  45. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect
    Check replica4on lag
    Modify DNS record to
    switch Aurora endpoints
    Unplanned failover
    on Global Database

    View Slide

  46. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect
    Check replica4on lag
    Modify DNS record to
    switch Aurora endpoints
    Unplanned failover
    on Global Database
    Chack health of app
    on Osaka

    View Slide

  47. Make DR opera3on flowchart beforehand!
    Disaster occured! Decide to ac4vate DR
    Check health of Osaka
    cluster using SQL
    Shut requests out
    by enabling redirect
    Check replica4on lag
    Modify DNS record to
    switch Aurora endpoints
    Unplanned failover
    on Global Database
    Chack health of app
    on Osaka
    within RTO

    View Slide

  48. In my case...
    • Automated all the operaCons with Exastro (Japanese so,ware)
    • Separeted operaCons by group, making it easy to go flexible with situaCon
    • Using Ansible for included operaCons of on-prem network components
    You will make mistake on emergency. Automate it!

    View Slide

  49. If you can, plan regular training with all the stakeholders
    • OperaCons team
    • Infrastructure developers (including DBA)
    • ApplicaCon developers
    • Management (who can decide to acCvate DR)
    And prac3ce DR opera3ons regularly!

    View Slide

  50. Thank you!
    Minoru Onda
    How to make DR-ready system
    with Amazon Aurora Global Database Give me your feedback!

    View Slide