Minoru Onda @minorun365 So#ware engineer KDDI Corpora1on & KDDI Agile Development Center Corpora1on A W S C O M M U N I T Y B U I L D E R S A PJ O P E N M I C – M AY
mulC-region planning Started DR planning a?er Direct Connect outage VPC Tokyo region EKS on EC2 Aurora iPads on shops Private network Private network On-prem workloads DX DX
point objecCve (RPO) Basic strategies of disaster recovery (DR) on AWS aws.amazon.com/jp/blogs/architecture/disaster-recovery-dr-architecture-on-aws-part-i-strategies-for-recovery-in-the-cloud/
• To make DR-ready • To enable Blue/Green deployment “You build mul<-DB, you run it consistently!” Difficul3es on mul3ple database Werner Vogels on Wikipedia, however he never says above
phase commit (2PC) • Saga pe8ern Difficul3es on mul3ple database From database • Logical replica+on • Phisical replica+on ☝ You can use it easily with Amazon Aurora!
Amazon Aurora Global Database On Aurora PostgreSQL • Ver 11.17+ (minor versions) • Ver 12.12+ (minor versions) • Ver 13.8+ (minor versions) • Ver 14.5+ (minor versions)
Planned failover (managed) • One-click on console • Cannot used on emergency What should you do on disaster? Unplanned failover • Manual opera9on with steps • Available even on disaster
Planned failover (managed) • One-click on console • Cannot used on emergency What should you do on disaster? Unplanned failover • Manual opera9on with steps • Available even on disaster 👇 Use it first!
DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database
DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database Chack health of app on Osaka
DR Check health of Osaka cluster using SQL Shut requests out by enabling redirect Check replica4on lag Modify DNS record to switch Aurora endpoints Unplanned failover on Global Database Chack health of app on Osaka within RTO
(Japanese so,ware) • Separeted operaCons by group, making it easy to go flexible with situaCon • Using Ansible for included operaCons of on-prem network components You will make mistake on emergency. Automate it!
• OperaCons team • Infrastructure developers (including DBA) • ApplicaCon developers • Management (who can decide to acCvate DR) And prac3ce DR opera3ons regularly!