Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Design patterns for multi-Region applications a...

Design patterns for multi-Region applications and data in AWS (DAT207)

Youtube link - https://www.youtube.com/watch?v=pbrR9IBcx2E

AWS has the largest infrastructure footprint in the world, spanning countries and continents. This makes it possible to build multi-Region applications that serve user requests with low latency from nearly anywhere, tolerate all sorts of possible outages including major Region-level incidents, and comply with data regulatory requirements. Join us to learn about design patterns for building multi-Region applications on AWS with a focus on low latency and high availability. Explore common deployment patterns used by innovative digital native companies and global corporations, and learn about trade-offs with each pattern.

AMEY BANARSE

December 03, 2024
Tweet

More Decks by AMEY BANARSE

Other Decks in Technology

Transcript

  1. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. S P O N S O R E D B Y Y U G A B Y T E D B Design patterns for multi-Region applications and data in AWS Amey Banarse D AT 2 0 7 - S VP of Solutions Engineering YugabyteDB
  2. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Amey Banarse VP of Solutions Engineering, YugabyteDB
  3. © 2024 – All Rights Reserved seamless scalability built-in resilience

    flexible geo-distribution cost efficiency Run your business-critical applications with using PostgreSQL-compatible and Cassandra-inspired APIs while enjoying without compromising on performance
  4. © 2024 – All Rights Reserved Postgres architected as a

    flexible managed service in any public or private cloud. We are reimagining Postgres as a native cloud service, not just running it in the cloud. 3. Architected as a Cloud DBMS Bring capabilities of leading commercial RDBMS to Postgres in a cloud-native architecture. E.g. DR & replication, perf & observability, security, etc. 2. Enterprise Grade by Default YugabyteDB: Building on Top of Postgres Innovations Fully PostgreSQL compatible API for workload portability. Leverage resilience, dynamic scalability, and multi-site distribution in the DB to make your app cloud native. 1. Distributed Postgres
  5. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. PostgreSQL has become the default database API ◦ Powerful RDBMS capabilities: matches Oracle features ◦ Robust and mature: hardened over 30 years ◦ Fully open source: permissive license, large community ◦ Cloud providers adopting: managed services on all clouds “Most popular database” of 2022* “DBMS of the year” over multiple years** 2017 2018 2020 * https://www.eversql.com/most-popular-databases ** https://db-engines.com/en/blog_post/85
  6. © 2024 – All Rights Reserved PostgreSQL Compatibility and Cloud

    Native Architecture are Critical 9 Can use PostgreSQL client drivers and psql shell Parse PG syntax - but execution is different Syntax Supports some advanced PG features - but they will work differently Feature Exactly like Postgres. Port over all existing apps, PG developers instantly at home. Runtime Wire How much Postgres compatibility? How cloud native (distributed) is the architecture? Cannot deliver high data durability, availability, scale, best in class DR Low Delivers data durability and some vertical scale. Weak HA, horizontally scale, DR Medium High data durability, availability, scalability, DR, multi-region High
  7. © 2024 – All Rights Reserved 10 Can use PostgreSQL

    client drivers and psql shell Parse PG syntax - but execution is different Syntax Supports some advanced PG features - but they will work differently Feature Exactly like Postgres. Port over all existing apps, PG developers instantly at home. Runtime Wire How much Postgres compatibility? How cloud native (distributed) is the architecture? Cannot deliver high data durability, availability, scale, best in class DR Low Delivers data durability and some vertical scale. Weak HA, horizontally scale, DR Medium High data durability, availability, scalability, DR, multi-region High Can benefit from Postgres innovation (like pg_vector for gen AI, QoS for multi-tenancy, etc.) PG Innovation Threshold PostgreSQL Compatibility and Cloud Native Architecture are Critical
  8. © 2024 – All Rights Reserved PostgreSQL Compatibility and Cloud

    Native Architecture are Critical 11 Can use PostgreSQL client drivers and psql shell Parse PG syntax - but execution is different Syntax Supports some advanced PG features - but they will work differently Feature Exactly like Postgres. Port over all existing apps, PG developers instantly at home. Runtime Wire How much Postgres compatibility? How cloud native (distributed) is the architecture? Cannot deliver high data durability, availability, scale, best in class DR Low Delivers data durability and some vertical scale. Weak HA, horizontally scale, DR Medium High data durability, availability, scalability, DR, multi-region High Can benefit from Postgres innovation (like pg_vector for gen AI, QoS for multi-tenancy, etc.) Can innovate on distributed, cloud native architecture (like zero downtime, global apps, fast auto- scaling, connection scaling, etc.) Can innovate on both dimensions PG Innovation Threshold Cloud DBMS Innovation Threshold
  9. © 2024 – All Rights Reserved PostgreSQL Compatibility and Cloud

    Native Architecture are Critical 12 Can use PostgreSQL client drivers and psql shell Parse PG syntax - but execution is different Syntax Supports some advanced PG features - but they will work differently Feature Exactly like Postgres. Port over all existing apps, PG developers instantly at home. Runtime Wire How much Postgres compatibility? How cloud native (distributed) is the architecture? Cannot deliver high data durability, availability, scale, best in class DR Low Delivers data durability and some vertical scale. Weak HA, horizontally scale, DR Medium High data durability, availability, scalability, DR, multi-region High Cloud DBMS Innovation Threshold PG Innovation Threshold
  10. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. The ability of a system to readily respond to or recover from change, disruption, or a crisis Resilience
  11. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Commodity servers fail, network interruptions are common More apps as everything is digital and more headless services Unexpected successes can overwhelm systems Resilience was always critical: So what changed? Cloud native = More failures Bigger scale = More failures Viral success = More failures
  12. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Just resilience is no longer enough . . .
  13. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Modern applications demand ultra- resilience Customers expect always-on apps Nations run on digital infrastructure Brand reputation requires uptime
  14. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Modern applications need resilience built into Postgres, not layered on top of it
  15. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. . . . for no downtime, no limits In-Region resilience Multi-Region BCDR Zero-downtime operations Data protection Peak and freak events Grey failures From resilience to ultra-resilience . . .
  16. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. How do you architect for zero downtime with YugabyteDB? • Assume nodes and zones will fail often • Users should have zero impact • RPO=0, RTO~3s with sync replication • Replication lag typically <500 ms with async Async replication between two clusters in different regions Region 2 Region 3 Region 1 Sync replication across regions within a cluster
  17. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s dive into the real-world examples of ultra-resilience architectures
  18. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Business objective: Get Paramount+ closer to their end users With the anticipated expansion through globalization and release of new services and content, Paramount+ needed a database platform that could perform and scale to support peak demands to provide the best user experience Multi-Region/cloud deployment • High availability and resilience • Performance at peak scale Compliance with local laws • Conform to GDPR regulations • Conform to local security laws
  19. © 2024 – All Rights Reserved YugabyteDB and Paramount+: Global

    authentication and user profiles Past challenges Paramount+ original single-region architecture with MySQL • Slow read performance due to MySQL being limited to a single region • No horizontal scalability due to limited primary and follower architecture (single 64- core node handled all writes) • Costly downtimes due to no region-level fault tolerance in single-region architecture • Potential data loss due to high replication lag and potential primary node failure Use case: Powers user log in, authorizes content viewing, and manages profile information (watchlist, account details, preferences, etc); YugabyteDB is the system of record for global authentication and all user profiles to view content
  20. © 2024 – All Rights Reserved 24 New multi-Region design

    that powered Super Bowl 2024 • Multi-Region – Stretch Sync deployment o Verified RPO=0 and RTO <10 secs on failures o Global DB - 3 Regions (east, central, west) on Public cloud IaaS, 5 AZs, replication RF=5 • Performance and scalability o Read latencies < 30ms, transactional multi-region write latencies ~100ms o Scaled clusters seamlessly for peak events (AFC playoffs, TopGun Maverick launch, etc.) • PostgreSQL runtime compatibility o Live Migration from MySQL to YugabyteDB • Ecosystem integrations and extensibility o Compliance with local laws for data residency
  21. YugabyteDB and Paramount+: Real-world success story Consistent global growth on

    YugabyteDB • Launched AFCs, Grammys, Top Gun Maverick • Expecting 3-6x growth across some events • Helped expand business to the EU region for Paramount+ International https://www.paramountpressexpress.com/cbs-sports/shows/nfl-on-cbs/releases/?view=109115-nfl-on-cbs-scores-the-most-watched-nfl-divisional-playoff-game-ever-with-more-than-50-million-viewers
  22. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Peak event: Super Bowl 2024 Use case • Media livestreaming platform • User registrations and entitlement lookup Peak scale • CBS Sports’ presentation of Super Bowl LVIII was the most-watched telecast in history, with 123.4 million viewers across platforms Challenges • Massively scaling user entitlements lookup • Resilience • Low latency for users around the world
  23. © 2024 – All Rights Reserved YugabyteDB and a large

    financial investment firm: A success story Key requirements • Support 150 TB (up from 2 TB) • Support 150K ops/sec (up from 1K ops/s) • 600K bulk ops/sec • Predictable reads at scale with P99 < 10ms • Resilience and availability - effective business continuity for the cloud outage scenarios at very high throughput Use case: This financial app aggregates and stores retail customers' portfolio data, organizing it by customer/account and seamlessly transmitting to Fintechs and third-party aggregators such as Intuit Past challenges • Federal regulations mandated increased data retention from 4 days to 2 years, but high projected cost of current solution • Anticipated the service becoming more critical, thereby resulting in more queries, needed more efficiency and flexibility • Had to move from on-prem to AWS to support resilience, higher scale at lower costs & with higher agility
  24. © 2024 – All Rights Reserved Large financial investment firm:

    Technical results achieved Scalability • 15 nodes across 3 AZs, RF=3 with data scale up to 150TB • 700K ops/sec • Data retention of 2 years Performance: P99 read latency < 3ms write latency <5ms Async replication Availability Zone 2 Availability Zone 3 Availability Zone 2 Availability Zone 3 Availability Zone 1 Availability Zone 1 Resilience: YugabyteDB is deployed in each AWS Region across multiple zones for business continuity Disaster recovery: • YugabyteDB clusters in two AWS regions (us-east-1 and us-east-2) • Bidirectional, asynchronous data replication between them AWS Region us-east-1 AWS Region us-east-2
  25. © 2024 – All Rights Reserved Global retailer survives regional

    cloud outage 29 Multi-Region Product Catalog YugabyteDB Cluster US-West US-Central US-East Use Case: YugabyteDB powers the Global Product Catalog system serving 1.6 billion products to end customers in the US • 36 Nodes across 3 Public IaaS Regions: US-East, US-West and US-Central • Implement Preferred Leaders for low latency access in US Central • Cluster + topology aware drivers quickly identify newly added nodes • Supports reactive microservices & event driven architecture patterns
  26. © 2024 – All Rights Reserved Global retailer survives regional

    cloud outage 30 Results achieved with YugabyteDB • Service remained resilient and available through Texas cloud and power outage • Applications automatically redirected to other regions • No data loss (RPO = 0) and RTO <10 secs • Sustaining high throughput of 250K+ TPS & geo-distributed for low latency read access Multi-Region Product Catalog YugabyteDB Cluster US-West US-Central US-East Use Case: YugabyteDB powers the Global Product Catalog system serving 1.6 billion products to end customers in the US
  27. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Lessons learned ✓ Prepared for unexpected bursts ✓ Built for expected peaks ✓ Surviving DDoS attacks ✓ Flexible expansion, anywhere ✓ Multitenancy ✓ No performance compromise
  28. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. • Entire Region or data center failure—low probability but we see it happen regularly • Failures that last a while • Complex process to “heal” once the Region/DC is back online • Ability to trade off between steady-state performance (latency) and potential data loss (RPO) • Very quick recovery (low RTO) • Ability to run DR drills, planned switchover and chaos testing What can go wrong… What you want…
  29. © 2024 – All Rights Reserved 33 Customers around the

    world trust YugabyteDB to run their business 33
  30. © 2024 – All Rights Reserved YugabyteDB is on a

    path to become the default database in the enterprise 2021 2016 2022 2023 Resilience and scale Scalable YSQL and YCQL Sync and async replication Geo-residency 2024 Enhanced PG compatibility Like Postgres + built-in resilience On-demand scaling 2025 Serverless Serverless offering for small workloads that go to 0 CONFIDENTIAL: DO NOT DISTRIBUTE 2026 2027 Multitenancy Workload consolidation on a single cluster Great for workloads that need resilience (HA), scale, or geo- distribution Great for mid-size workloads that may need unpredictable scale in the future Great for low scale workloads that require QoS Great for small scale standalone cloud native workloads
  31. © 2024, Amazon Web Services, Inc. or its affiliates. All

    rights reserved. Thank you! © 2024, Amazon Web Services, Inc. or its affiliates. All rights reserved. Please complete the session survey in the mobile app Amey Banarse @ameybanarse