Slide 1

Slide 1 text

Amey Banarse VP of Solutions Engineering, YugabyteDB FROM RESILIENCE TO ULTRA-RESILIENCE OF DATA FOR MODERN APPLICATIONS

Slide 2

Slide 2 text

© 2024 – All Rights Reserved seamless Scalability built-in Resilience flexible Geo-Distribution Cost Efficiency Run your business-critical applications with using PostgreSQL-compatible & Cassandra-inspired APIs while enjoying without compromising on performance 2

Slide 3

Slide 3 text

© 2024 – All Rights Reserved PostgreSQL has become the default database API ○ Powerful RDBMS capabilities: matches Oracle features ○ Robust and mature: hardened over 30 years ○ Fully open source: permissive license, large community ○ Cloud providers adopting: managed services on all clouds “Most popular database” of 2022 “DBMS of the year” over multiple years 2017 2018 2020

Slide 4

Slide 4 text

© 2024 – All Rights Reserved Wire-Protocol Compatibility Syntax Compatibility Feature Compatibility Runtime Compatibility Compatible with PG client drivers ✓ ✓ ✓ ✓ Parses PG syntax properly (but execution may be different) ✘ ✓ ✓ ✓ Supports equivalent features (but with different syntax & runtime) ✘ ✘ ✓ ✓ Appears and behaves just like PG to applications ✘ ✘ ✘ ✓ Not all “PostgreSQL Compatibilityˮ is created equal

Slide 5

Slide 5 text

© 2024 – All Rights Reserved Pluggable Query Layer YCQL API Cassandra Compatible YSQL API PostgreSQL Compatible Other APIs (future) Innovative architecture combines best of databases Distributed, Transactional Storage Layer Automatic Sharding Load Balancing Distributed Transactions Raft Consensus On-Premises Datacenters Deploy Anywhere 5

Slide 6

Slide 6 text

© 2024 – All Rights Reserved 6 YugabyteDB Voyager to simplify the Migration Journey Cloud, On-Premise RHEL / CentOS / Ubuntu MacOS / Docker YugabyteDB OSS, Managed, Anywhere PostgreSQL 9.x 11.x Oracle 11g, 12c-19c MySQL 8.x Amazon Aurora Amazon RDS Google Cloud SQL Azure SQL for PG Voyager

Slide 7

Slide 7 text

© 2024 – All Rights Reserved 7 The ability of a system to readily respond to or recover from change, disruption, or a crisis Resilience

Slide 8

Slide 8 text

© 2024 – All Rights Reserved Just resilience is no longer enough…

Slide 9

Slide 9 text

© 2024 – All Rights Reserved Modern applications demand ultra-resilience Customers expect always-on apps Nations run on digital infrastructure Brand reputation requires uptime

Slide 10

Slide 10 text

© 2024 – All Rights Reserved Commodity servers fail, network interruptions are common More apps as everything is digital and more headless services Unexpected successes can overwhelm systems Resilience to ultra-resilience: what changed? Cloud Native = More Failures Bigger Scale = More Failures Viral Success = More Failures

Slide 11

Slide 11 text

© 2024 – All Rights Reserved

Slide 12

Slide 12 text

© 2024 – All Rights Reserved Major cloud outages arenʼt uncommon Per quarter outages in Asia Pacific “Outages costing companies more than $1 million has increased from 11% to 15% since 2019.ˮ https://foundershield.com/blog/real-world-statistics-on-managing-cloud-outage-risks/

Slide 13

Slide 13 text

© 2024 – All Rights Reserved ● Infrastructure failures ● Region and data center outage ● User, app or operator error ● Upgrades / patching downtime ● Intermittent or partial failures ● Massive or unexpected Spikes Different failure modes require different elements of resilience In-region resilience Multi-region BCDR Data protection Zero-downtime operations Grey failures Peak and freak events

Slide 14

Slide 14 text

© 2024 – All Rights Reserved From resilience to ultra-resilience… … for no downtime, no limits In-region resilience Multi-region BCDR Zero-downtime operations Data protection Peak and freak events Grey failures

Slide 15

Slide 15 text

© 2024 – All Rights Reserved Ultra-resilience gives you no downtime, no limits.

Slide 16

Slide 16 text

© 2024 – All Rights Reserved Letʼs dive into the Real World Examples of ultra-resilience architectures

Slide 17

Slide 17 text

© 2024 – All Rights Reserved Peak scale

Slide 18

Slide 18 text

© 2024 – All Rights Reserved

Slide 19

Slide 19 text

© 2024 – All Rights Reserved Business Objective: Get Paramount+ Closer to Their End Users With the anticipated expansion through globalization and release of new services and content, Paramount+ needed a database platform that could perform and scale to support peak demands to provide the best user experience. ● Multi-Region/Cloud Deployment ○ High availability and resilience ○ Performance at peak scale ● Compliance with local laws ○ Conform to GDPR regulations ○ Conform to local security laws

Slide 20

Slide 20 text

© 2024 – All Rights Reserved Peak Event: Super Bowl 2024 ○ Use Case ○ Media live streaming platform ○ User registrations and entitlement lookup ○ Peak ○ CBS Sportsʼ presentation of Super Bowl LVIII was the most-watched telecast in history, with 123.4 million viewers across platforms ○ Challenges ○ Massively scaling user entitlements lookup ○ Resilience ○ Low latency for users around the world

Slide 21

Slide 21 text

© 2024 – All Rights Reserved Unexpected failures or disasters

Slide 22

Slide 22 text

© 2024 – All Rights Reserved ✓ Prepared for unexpected bursts ✓ Built for expected peaks ✓ Surviving DDoS attacks ✓ Flexible expansion, anywhere ✓ Multitenancy ✓ No performance compromise What matters

Slide 23

Slide 23 text

© 2024 – All Rights Reserved Retailer Weathered a Regional Cloud Outage Top 5 Global Retailer ○ Use Case: ○ Product catalog for a global top 5 retailer ○ Over 1.6 billion products ○ Freak events: ○ Snowstorm in Texas took out a cloud region ○ Key Challenges ○ High availability: Keeping the product catalog up during peak holiday season in spite of the cloud outage ○ Sustaining high throughput of 250k+ tps

Slide 24

Slide 24 text

Yugabyte © 2023 – All Rights Reserved Yugabyte © 2023 – All Rights Reserved 24 Demo: Failover entire Region and show impact on application. 24

Slide 25

Slide 25 text

© 2024 – All Rights Reserved We are going Global - US, EU & APJ ● Single YB cluster providing Strong Consistency across multi-region ● Scalable and highly available operational data tier ● Business continuity, able to withstand Region failure with RPO=0 ● Geo-partitioning, Data Locality & Compliance 25

Slide 26

Slide 26 text

© 2024 – All Rights Reserved Changes, disruptions and crisis take many shapes ● Infrastructure failures ● Region and data center outages ● User, app or operator errors ● Downtime from upgrades / patching ● Intermittent or partial failures ● Massive or unexpected spikes

Slide 27

Slide 27 text

© 2024 – All Rights Reserved Changes, disruptions and crisis take many shapes ● Infrastructure failures ● Region and data center outages ● User, app or operator errors ● Downtime from upgrades / patching ● Intermittent or partial failures ● Massive or unexpected spikes TRADITIONAL RESILIENCE Only these 2 failure types are addressed

Slide 28

Slide 28 text

© 2024 – All Rights Reserved ✓ Protection against region / DC outages to ensure business continuity ✓ E.g., power grid failures, natural disasters ✓ Nations are increasingly mandating multi-region resilience through regulatory compliance What is multi-region resilience?

Slide 29

Slide 29 text

© 2024 – All Rights Reserved ● Entire region or data center failure—low probability but we see it happen regularly ● Failures that last a while ● Complex process to “healˮ once the region / DC is back online ● Ability to tradeoff between steady-state performance (latency) and potential data loss RPO ● Very quick recovery (low RTO ● Ability to run DR drills - planned switchover What can go wrong… What you want…

Slide 30

Slide 30 text

© 2024 – All Rights Reserved 30 Thank You Join us on Slack: www.yugabyte.com/slack Star us on GitHub: github.com/yugabyte/yugabyte-db