stateRunnerJavaSummit.pdf

Do Developers Dream of Stateless Apps? Java Global Summit, August
2nd by Łukasz Gebel @rauluka7

Do We Dream of Stateless Apps?

State “A condition or way of being that exists at
a particular time.” Cambridge Dictionary

State - computer science “(...) a program is described as
stateful if it is designed to remember preceding events or user interactions.” Wikipedia

State - computer science “The remembered information is called the
state of the system.” Wikipedia

State, state everywhere

Stateful vs Stateless

Fully Stateless

Stateful or Stateless?

Fully Stateful

Awfully Stateful

High availability - strategies

Scaling • Vertical • Horizontal

Multi-region

Multi-region - basic units

Multi-region Azure vs AWS Azure • 60 Regions...

Multi-region Azure vs AWS Azure • 60 Regions... no Azure
you have 53 ;)

you have 53 ;) • 10 regions with Availability Zones

you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region

you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions

you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones

you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones • 2 or more AZ (except Osaka region)

VM Placement Strategies • Fault domains • Update domains

VM Placement Strategies

Why should we care? • Distributed systems need coordination: •
distrubute configuration • synchronize state • 2n + 1 machines • Minimal HA setup = 3 instances

I DON'T WANT TO LIVE ON THIS PLANET ANYMORE...

So let's scale Zookeepers! N = 2 2 N +
1 = 5

NOOOOOOOOOOOO!!!!!!!

Highly Available Cloud Services

Database

Azure SQL • DB as a managed service • Microsoft
SQL Server Database Engine • Scalability & High availability features

Scalability Dynamic scaling • not equal to autoscaling! • manual
process

Scalability • Automatic scaling • Use elastic pool - databases
share assigned resources • Implement custom solution based on DB metrics

High Availability Features • Active Geo-Replication • Failover Groups

Active Geo-Replication Primary Database

Active Geo-Replication Secondary Database in the same or different region

Active Geo-Replication Asynchronous data replication

Active Geo-Replication Manual failover to secondary database

Failover groups Secondary DB by default in other region West
Europe North Europe

Failover groups Automatic failover West Europe North Europe

Failover groups Single connection string directing to current primary db
West Europe North Europe YourPrimaryDB.com

Paired regions • Physical isolation - if possible at least
300 miles • Region recovery order - if multiple regions fail one of each pair is prioritized for recovery • Sequential updates - minimize impact of bugs or breaking changes West Europe North Europe

Demo time!

Azure SQL - my experience

But there is a hope... Azure SQL gained new features
lately: • Business Critical Tier • Zone Redundant deployments

Amazon RDS • Amazon Aurora, MySQL, PostreSQL ... • Multi
AZ deployment • Synchronous replication to Stand-by instance in different AZ

Cosmos DB

Cosmos DB • Multi-model DB • API: SQL, MongoDB, Cassandra,
Gremlin and more • Document based, table -row, graph, key value

High Availability Features • Single master - multiple readers replication
• Multi master replication • Add and remove regions on the go

Cosmos DB - High availability • Multi-homing api - interact
with replica that is closest to you • Regional failovers

HA - behind the scenes • Partitions are regionally redundant
• Within region every partion is replicated • Replicas have 10 - 20 fault domains 4 replicas X REGIONS

Demo time!

Cosmos DB - limitations • Documents size is max 2MB
• Only 5 geospatial functions • Remember - it's a document based DB • Consistency levels - choose and handle

Amazon DynamoDB • key-value, document based DB • multiregion •
multimaster

Storage

Storage • Distributed resources, assets • Share resources by HTTPS
• Backups, logs • Long living resources

Locally-redundant storage 3x within datacentre

Geo-redundant storage 3x primary region, then secondary region

Read-access GRS You can read from secondary region ;)

Azure Storage V2

Azure Storage V2 • Zone-redundant storage (ZRS) - 3 clusters
in different AZ • Geo-zone-redundant storage (GZRS) - ZRS + secondary region • Still, reading requires RA-GZRS ;)

Azure queue • Integrate different parts of sytem or different
systems • Infinite time to live (ttl) • 500 TB • Order is not guaranted (Service bus message session has it)

Azure queue • Unlimited number of concurrent clients

And suddenly • 503 Server busy ...

AWS • S3 • SQS • More mature Java SDKs

Storage - experience • Use latest APIs, SDKs (Java ones
are not so good as .NET ecosystem, but they improve!) • Take care of storage life cycle • Be ready for migration

Dynamic approach

SLA - Service Level Agreement

SLA “An SLA is a contractual agreement between a service
provider and a customer buying a service.” What Are The Chances An Availability SLA Will Be Violated? [1]

SLA “The agreement stipulates some minimum Quality of Service (QOS)
requirement.” What Are The Chances An Availability SLA Will Be Violated? [1]

SLA • How to understand it?

SLA Example Let's assume that we need both services at
the same time: • DB 99.99% • Storage 99.99% What's our SLA? • Use uptime approach

Probability When two events, A and B, are independent, the
probability of both occurring is: P(A and B) = P(A) * P(B) MathGoodies.com

Uptime approach DB↑ - DB is up S↑ - Storage
is up P(DB↑) = 0.9999 P(S↑) = 0.9999 P(DB↑) * P(S↑) = 0.9999 * 0.9999

Uptime approach 99.980001%

So let's go multiregion!

Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -
up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓and US↓)

Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ -
up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓) * P(US↓)

Multiregion SLA P(Europe↓) = P(US↓) = 1 - 99.980001 =
0.00019999

Multiregion SLA P(Europe↑ or US↑) = 1 - 0.00019999 *
0.00019999

Multiregion SLA 99.999996%

Multiregion SLA TM↑ - Traffic Manager is up TM= 99.99%
P(TM↑) * P(Europe↑ or US↑) = 0.9999 * 0.99999996

Multiregion SLA 99.989996%

Complexity is deceptively simple

Call to action! 1. Calculate SLA for you system.

Call to action! 1. Calculate SLA for you system. 2.
Play with HA features and be ready for failure.

Call to action! 1. Calculate SLA for you system. 2.
Play with HA features and be ready for failure. 3. Check if you using cloud geo infrastructre in a way that it fits your HA needs.

Do We Dream of Stateless Apps?

Bibliography • https://nofluffjuststuff.com/magazine/2017/10/cloud_native_apps_must_be_stateless_myth_or_ fact_ • https://www.researchgate.net/publication/221056071_What_Are_the_Chances_an_Availability_S LA_will_be_Violated • https://vincentlauzon.com/2018/01/22/solution-slas-in-azure/ •
http://tuxlabs.com/?p=267 • https://www.bizety.com/2018/08/21/stateful-vs-stateless-architecture-overview/ • https://www.xenonstack.com/insights/stateful-and-stateless-applications/ • https://docs.microsoft.com/en-us/azure/architecture/aws-professional/ • https://cloudacademy.com/blog/aws-regions-and-availability-zones-the-simplest-explanation- you-will-ever-find-around/

Bibliography • https://aws.amazon.com/about-aws/global-infrastructure/regions_az/ • https://docs.aws.amazon.com/en_pv/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZone s.html • https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-disaster-recovery-strategies-for-
applications-with-elastic-pool • https://azure.microsoft.com/en-us/blog/azure-sql-databases-disaster-recovery-101/ • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group Graphics attributions: • https://www.behance.net/gallery/64391071/Tinkoff-systems?tracking_source=search%7Ccyber%20map • https://www.studiodaily.com/2018/01/designing-retro-tech-look-future-blade-runner-2049/ • https://www.behance.net/gallery/42879183/URBS?tracking_source=search%7Cfuturistic%20warehouse

Slides and code • Code: https://github.com/rauluka/state-runner-code

stateRunnerJavaSummit.pdf

stateRunnerJavaSummit.pdf

More Decks by rauluka7

Other Decks in Technology

Featured

Transcript