Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Do Developers Dream of Stateless Apps?

Do Developers Dream of Stateless Apps?

The version of presentation given at DevOpsPro2020 25.03.2020: https://devopspro.lt

In Blade Runner by P. K. Dick, trained hunters had to retire problematic Androids. We, Developers, are similar to those hunters. Our job is to solve problems. State brings complexity and troubles. Getting rid of it is not always possible. How to make our stateful distributed system highly available?

It’s a story based on the experience that Lukasz gained while working on stateful distributed systems deployed in cloud environments (Azure, AWS). It includes what went well and what is more important, what went wrong. He’ll start with defining state and explain differences between stateful and stateless apps (it’s not so obvious!).

Then Lukasz will discuss the strategies that we can use in cloud environments to ensure high availability our or systems. We’ll go through scaling, multi-region deployments, and why sometimes we need to care where our machines are located.

In the third part of this talk, he’ll focus on tools that help us to deal with the state and their high availability features provided by cloud. He’ll show you the live demo of Azure SQL failover and compare it to Cosmos DB. Luaksz will also discuss Storage and Queues. Understanding the limitations of tools we use is as important as being aware of what happens under the hood. It is needed to build reliable architecture.

Lukasz will sum up the talk by explaining what is SLA and how to calculate it for your system (yes, there will be some math). So, are we problem hunters or we are haunted by problems? Join his presentation, make your system highly available and dream peaceful dreams.

rauluka7

March 25, 2020
Tweet

More Decks by rauluka7

Other Decks in Technology

Transcript

  1. State “A condition or way of being that exists at

    a particular time.” Cambridge Dictionary
  2. State - computer science “(...) a program is described as

    stateful if it is designed to remember preceding events or user interactions.” Wikipedia
  3. Multi-region Azure vs AWS Azure • 58 Regions... no Azure

    you have 52 ;) • 10 Availability Zones
  4. Multi-region Azure vs AWS Azure • 58 Regions... no Azure

    you have 52 ;) • 10 Availability Zones AWS • 23 Regions
  5. Multi-region Azure vs AWS Azure • 58 Regions... no Azure

    you have 52 ;) • 10 Availability Zones AWS • 23 Regions • 70 Availability Zones (AZ)
  6. Multi-region Azure vs AWS Azure • 58 Regions... no Azure

    you have 52 ;) • 10 Availability Zones AWS • 23 Regions • 70 Availability Zones (AZ) • 2 or more AZ (exept Osaka region)
  7. Why should we care? • Distributed systems need coordination: •

    distrubute configuration • synchronize state • 2n + 1 machines • Minimal HA setup = 3 instances
  8. Azure SQL • DB as a managed service • Microsoft

    SQL Server Database Engine • Scalability & High availability features
  9. Scalability • Automatic scaling • Use elastic pool - databases

    share assigned resources • Implement custom solution based on DB metrics
  10. Paired regions • Physical isolation - if possible at least

    300 miles • Region recovery order - if multiple regions fail one of each pair is prioritized for recovery • Sequential updates - minimize impact of bugs or breaking changes West Europe North Europe
  11. But there is a hope... Azure SQL gained new features

    lately: • Business Critical Tier • Zone Redundant deployments
  12. Amazon RDS • Amazon Aurora, MySQL, PostreSQL ... • Multi

    AZ deployment • Synchronous replication to Stand-by instance in different AZ
  13. Cosmos DB • Multi-model DB • API: SQL, MongoDB, Cassandra,

    Gremlin and more • Document based, table -row, graph, key value
  14. High Availability Features • Single master - multiple readers replication

    • Multi master replication • Add and remove regions on the go
  15. Cosmos DB - High availability • Multi-homing api - interact

    with replica that is closest to you • Regional failovers
  16. HA - behind the scenes • Partitions are regionally redundant

    • Within region every partion is replicated • Replicas have 10 - 20 fault domains 4 replicas X REGIONS
  17. Cosmos DB - limitations • Documents size is max 2MB

    • Only 5 geospatial functions • Remember - it's a document based DB • Consistency levels - choose and handle
  18. Azure Storage V2 • Zone-redundant storage (ZRS) - 3 clusters

    in different AZ • Geo-zone-redundant storage (GZRS) - ZRS + secondary region • Still, reading requires RA-GZRS ;)
  19. Azure queue • Integrate different parts of sytem or different

    systems • Infinite time to live (ttl) • 500 TB • Order is not guaranted (Service bus message session has it)
  20. Storage - experience • Use latest APIs, SDKs (Java ones

    are not so good as .NET ecosystem, but they improve!) • Take care of storage life cycle • Be ready for migration
  21. SLA “An SLA is a contractual agreement between a service

    provider and a customer buying a service.” What Are The Chances An Availability SLA Will Be Violated? [1]
  22. SLA “The agreement stipulates some minimum Quality of Service (QOS)

    requirement.” What Are The Chances An Availability SLA Will Be Violated? [1]
  23. SLA Example Let's assume that we need both services at

    the same time: • DB 99.99% • Storage 99.99% What's our SLA? • Use uptime approach
  24. Probability When two events, A and B, are independent, the

    probability of both occurring is: P(A and B) = P(A) * P(B) MathGoodies.com
  25. Uptime approach DB↑ - DB is up S↑ - Storage

    is up P(DB↑) = 0.9999 P(S↑) = 0.9999 P(DB↑) * P(S↑) = 0.9999 * 0.9999
  26. Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -

    up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓and US↓)
  27. Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -

    up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓) * P(US↓)
  28. Multiregion SLA TM↑ - Traffic Manager is up TM= 99.99%

    P(TM↑) * P(Europe↑ or US↑) = 0.9999 * 0.99999996
  29. Call to action! 1. Calculate SLA for you system. 2.

    Play with HA features and be ready for failure.
  30. Call to action! 1. Calculate SLA for you system. 2.

    Play with HA features and be ready for failure. 3. Care about what happens in the cloud.
  31. Bibliography • https://nofluffjuststuff.com/magazine/2017/10/cloud_native_apps_must_be_stateless_myth_or_ fact_ • https://www.researchgate.net/publication/221056071_What_Are_the_Chances_an_Availability_S LA_will_be_Violated • https://vincentlauzon.com/2018/01/22/solution-slas-in-azure/ •

    http://tuxlabs.com/?p=267 • https://www.bizety.com/2018/08/21/stateful-vs-stateless-architecture-overview/ • https://www.xenonstack.com/insights/stateful-and-stateless-applications/ • https://docs.microsoft.com/en-us/azure/architecture/aws-professional/ • https://cloudacademy.com/blog/aws-regions-and-availability-zones-the-simplest-explanation- you-will-ever-find-around/
  32. Bibliography • https://aws.amazon.com/about-aws/global-infrastructure/regions_az/ • https://docs.aws.amazon.com/en_pv/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZone s.html • https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-disaster-recovery-strategies-for-

    applications-with-elastic-pool • https://azure.microsoft.com/en-us/blog/azure-sql-databases-disaster-recovery-101/ • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group Graphics attributions: • https://www.behance.net/gallery/64391071/Tinkoff-systems?tracking_source=search%7Ccyber%20map • https://www.studiodaily.com/2018/01/designing-retro-tech-look-future-blade-runner-2049/ • https://www.behance.net/gallery/42879183/URBS?tracking_source=search%7Cfuturistic%20warehouse