Upgrade to Pro — share decks privately, control downloads, hide ads and more …

stateRunnerJavaSummit.pdf

rauluka7
August 02, 2020

 stateRunnerJavaSummit.pdf

In Blade Runner by P. K. Dick, trained hunters had to retire problematic Androids. We, Developers, are similar to those hunters. Our job is to solve problems. State brings complexity and troubles. Getting rid of it is not always possible. How to make our stateful distributed system highly available?

It’s a story based on the experience that I gained while working on stateful distributed systems deployed in cloud environments (Azure, AWS). It includes what went well and what is more important, what went wrong. I’ll start with defining state and explain differences between stateful and stateless apps (it’s not so obvious!).

Then I’ll discuss the strategies that we can use in cloud environments to ensure high availability our or systems. We’ll go through scaling, multi-region deployments, and why sometimes we need to care where our machines are located.

In the third part of this talk, I’ll focus on tools that help us to deal with the state and their high availability features provided by cloud. I’ll show you the live demo of Azure SQL failover and compare it to Cosmos DB. I’ll also discuss Storage and Queues. Understanding the limitations of tools we use is as important as being aware of what happens under the hood. It is needed to build reliable architecture.

I’ll sum up the talk by explaining what is SLA and how to calculate it for your system (yes, there will be some math). So, are we problem hunters or we are haunted by problems? Join my presentation, make your system highly available and dream peaceful dreams.

rauluka7

August 02, 2020
Tweet

More Decks by rauluka7

Other Decks in Technology

Transcript

  1. State “A condition or way of being that exists at

    a particular time.” Cambridge Dictionary
  2. State - computer science “(...) a program is described as

    stateful if it is designed to remember preceding events or user interactions.” Wikipedia
  3. Multi-region Azure vs AWS Azure • 60 Regions... no Azure

    you have 53 ;) • 10 regions with Availability Zones
  4. Multi-region Azure vs AWS Azure • 60 Regions... no Azure

    you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region
  5. Multi-region Azure vs AWS Azure • 60 Regions... no Azure

    you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions
  6. Multi-region Azure vs AWS Azure • 60 Regions... no Azure

    you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones
  7. Multi-region Azure vs AWS Azure • 60 Regions... no Azure

    you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones • 2 or more AZ (except Osaka region)
  8. Why should we care? • Distributed systems need coordination: •

    distrubute configuration • synchronize state • 2n + 1 machines • Minimal HA setup = 3 instances
  9. Azure SQL • DB as a managed service • Microsoft

    SQL Server Database Engine • Scalability & High availability features
  10. Scalability • Automatic scaling • Use elastic pool - databases

    share assigned resources • Implement custom solution based on DB metrics
  11. Paired regions • Physical isolation - if possible at least

    300 miles • Region recovery order - if multiple regions fail one of each pair is prioritized for recovery • Sequential updates - minimize impact of bugs or breaking changes West Europe North Europe
  12. But there is a hope... Azure SQL gained new features

    lately: • Business Critical Tier • Zone Redundant deployments
  13. Amazon RDS • Amazon Aurora, MySQL, PostreSQL ... • Multi

    AZ deployment • Synchronous replication to Stand-by instance in different AZ
  14. Cosmos DB • Multi-model DB • API: SQL, MongoDB, Cassandra,

    Gremlin and more • Document based, table -row, graph, key value
  15. High Availability Features • Single master - multiple readers replication

    • Multi master replication • Add and remove regions on the go
  16. Cosmos DB - High availability • Multi-homing api - interact

    with replica that is closest to you • Regional failovers
  17. HA - behind the scenes • Partitions are regionally redundant

    • Within region every partion is replicated • Replicas have 10 - 20 fault domains 4 replicas X REGIONS
  18. Cosmos DB - limitations • Documents size is max 2MB

    • Only 5 geospatial functions • Remember - it's a document based DB • Consistency levels - choose and handle
  19. Azure Storage V2 • Zone-redundant storage (ZRS) - 3 clusters

    in different AZ • Geo-zone-redundant storage (GZRS) - ZRS + secondary region • Still, reading requires RA-GZRS ;)
  20. Azure queue • Integrate different parts of sytem or different

    systems • Infinite time to live (ttl) • 500 TB • Order is not guaranted (Service bus message session has it)
  21. Storage - experience • Use latest APIs, SDKs (Java ones

    are not so good as .NET ecosystem, but they improve!) • Take care of storage life cycle • Be ready for migration
  22. SLA “An SLA is a contractual agreement between a service

    provider and a customer buying a service.” What Are The Chances An Availability SLA Will Be Violated? [1]
  23. SLA “The agreement stipulates some minimum Quality of Service (QOS)

    requirement.” What Are The Chances An Availability SLA Will Be Violated? [1]
  24. SLA Example Let's assume that we need both services at

    the same time: • DB 99.99% • Storage 99.99% What's our SLA? • Use uptime approach
  25. Probability When two events, A and B, are independent, the

    probability of both occurring is: P(A and B) = P(A) * P(B) MathGoodies.com
  26. Uptime approach DB↑ - DB is up S↑ - Storage

    is up P(DB↑) = 0.9999 P(S↑) = 0.9999 P(DB↑) * P(S↑) = 0.9999 * 0.9999
  27. Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -

    up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓and US↓)
  28. Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -

    up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓) * P(US↓)
  29. Multiregion SLA TM↑ - Traffic Manager is up TM= 99.99%

    P(TM↑) * P(Europe↑ or US↑) = 0.9999 * 0.99999996
  30. Call to action! 1. Calculate SLA for you system. 2.

    Play with HA features and be ready for failure.
  31. Call to action! 1. Calculate SLA for you system. 2.

    Play with HA features and be ready for failure. 3. Check if you using cloud geo infrastructre in a way that it fits your HA needs.
  32. Bibliography • https://nofluffjuststuff.com/magazine/2017/10/cloud_native_apps_must_be_stateless_myth_or_ fact_ • https://www.researchgate.net/publication/221056071_What_Are_the_Chances_an_Availability_S LA_will_be_Violated • https://vincentlauzon.com/2018/01/22/solution-slas-in-azure/ •

    http://tuxlabs.com/?p=267 • https://www.bizety.com/2018/08/21/stateful-vs-stateless-architecture-overview/ • https://www.xenonstack.com/insights/stateful-and-stateless-applications/ • https://docs.microsoft.com/en-us/azure/architecture/aws-professional/ • https://cloudacademy.com/blog/aws-regions-and-availability-zones-the-simplest-explanation- you-will-ever-find-around/
  33. Bibliography • https://aws.amazon.com/about-aws/global-infrastructure/regions_az/ • https://docs.aws.amazon.com/en_pv/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZone s.html • https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-disaster-recovery-strategies-for-

    applications-with-elastic-pool • https://azure.microsoft.com/en-us/blog/azure-sql-databases-disaster-recovery-101/ • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group Graphics attributions: • https://www.behance.net/gallery/64391071/Tinkoff-systems?tracking_source=search%7Ccyber%20map • https://www.studiodaily.com/2018/01/designing-retro-tech-look-future-blade-runner-2049/ • https://www.behance.net/gallery/42879183/URBS?tracking_source=search%7Cfuturistic%20warehouse