Do Developers Dream
of Stateless Apps?
NDC Oslo 2020, June 11th by Łukasz Gebel @rauluka7
Slide 2
Slide 2 text
No content
Slide 3
Slide 3 text
No content
Slide 4
Slide 4 text
Do We Dream of
Stateless Apps?
Slide 5
Slide 5 text
No content
Slide 6
Slide 6 text
No content
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
State
“A condition or way of being that exists at a particular time.”
Cambridge Dictionary
Slide 12
Slide 12 text
State - computer science
“(...) a program is described as stateful if it is designed
to remember preceding events or user interactions.”
Wikipedia
Slide 13
Slide 13 text
State - computer science
“The remembered information is called the state of the
system.”
Wikipedia
Slide 14
Slide 14 text
State, state everywhere
Slide 15
Slide 15 text
State, state everywhere
Slide 16
Slide 16 text
State, state everywhere
Slide 17
Slide 17 text
State, state everywhere
Slide 18
Slide 18 text
Stateful vs Stateless
Slide 19
Slide 19 text
Fully Stateless
Slide 20
Slide 20 text
Stateful or Stateless?
Slide 21
Slide 21 text
Fully Stateful
Slide 22
Slide 22 text
Awfully Stateful
Slide 23
Slide 23 text
High availability - strategies
Slide 24
Slide 24 text
Scaling
• Vertical
• Horizontal
Slide 25
Slide 25 text
Multi-region
Slide 26
Slide 26 text
Multi-region - basic units
Slide 27
Slide 27 text
Multi-region Azure vs AWS
Azure
• 60 Regions...
Slide 28
Slide 28 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
Slide 29
Slide 29 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
• 10 regions with Availability
Zones
Slide 30
Slide 30 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
• 10 regions with Availability
Zones
• Minimum 3xAZ per region
Slide 31
Slide 31 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
• 10 regions with Availability
Zones
• Minimum 3xAZ per region
AWS
• 23 Regions
Slide 32
Slide 32 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
• 10 regions with Availability
Zones
• Minimum 3xAZ per region
AWS
• 23 Regions
• 70 Availability Zones
Slide 33
Slide 33 text
Multi-region Azure vs AWS
Azure
• 60 Regions... no Azure you
have 53 ;)
• 10 regions with Availability
Zones
• Minimum 3xAZ per region
AWS
• 23 Regions
• 70 Availability Zones
• 2 or more AZ (except
Osaka region)
Slide 34
Slide 34 text
VM Placement Strategies
• Fault domains
• Update domains
Slide 35
Slide 35 text
VM Placement Strategies
Slide 36
Slide 36 text
Why should we care?
• Distributed systems need coordination:
• distrubute configuration
• synchronize state
• 2n + 1 machines
• Minimal HA setup = 3 instances
Slide 37
Slide 37 text
No content
Slide 38
Slide 38 text
No content
Slide 39
Slide 39 text
I DON'T WANT TO
LIVE ON THIS
PLANET ANYMORE...
Slide 40
Slide 40 text
No content
Slide 41
Slide 41 text
So let's scale Zookeepers!
N = 2 2 N + 1 = 5
Slide 42
Slide 42 text
No content
Slide 43
Slide 43 text
NOOOOOOOOOOOO!!!!!!!
Slide 44
Slide 44 text
Highly Available Cloud
Services
Slide 45
Slide 45 text
Database
Slide 46
Slide 46 text
Azure SQL
• DB as a managed service
• Microsoft SQL Server Database Engine
• Scalability & High availability features
Slide 47
Slide 47 text
Scalability
Dynamic scaling
• not equal to autoscaling!
• manual process
Slide 48
Slide 48 text
Scalability
• Automatic scaling
• Use elastic pool - databases share assigned resources
• Implement custom solution based on DB metrics
Slide 49
Slide 49 text
High Availability Features
• Active Geo-Replication
• Failover Groups
Slide 50
Slide 50 text
Active Geo-Replication
Primary Database
Slide 51
Slide 51 text
Active Geo-Replication
Secondary Database in the same or different region
Slide 52
Slide 52 text
Active Geo-Replication
Asynchronous data replication
Slide 53
Slide 53 text
Active Geo-Replication
Manual failover to secondary database
Slide 54
Slide 54 text
Failover groups
Secondary DB by default in other region
West Europe
North Europe
Slide 55
Slide 55 text
Failover groups
Automatic failover
West Europe
North Europe
Slide 56
Slide 56 text
Failover groups
Single connection string directing to current primary db
West Europe
North Europe
YourPrimaryDB.com
Slide 57
Slide 57 text
Paired regions
• Physical isolation - if possible at least 300 miles
• Region recovery order - if multiple regions fail one of each
pair is prioritized for recovery
• Sequential updates - minimize impact of bugs or breaking
changes
West Europe North Europe
Slide 58
Slide 58 text
Demo time!
Slide 59
Slide 59 text
Azure SQL - my experience
Slide 60
Slide 60 text
But there is a hope...
Azure SQL gained new features lately:
• Business Critical Tier
• Zone Redundant deployments
Slide 61
Slide 61 text
Amazon RDS
• Amazon Aurora, MySQL, PostreSQL ...
• Multi AZ deployment
• Synchronous replication to Stand-by instance in different
AZ
Slide 62
Slide 62 text
No content
Slide 63
Slide 63 text
Cosmos DB
Slide 64
Slide 64 text
Cosmos DB
• Multi-model DB
• API: SQL, MongoDB, Cassandra, Gremlin and more
• Document based, table -row, graph, key value
Slide 65
Slide 65 text
High Availability Features
• Single master - multiple readers replication
• Multi master replication
• Add and remove regions on the go
Slide 66
Slide 66 text
Cosmos DB - High availability
• Multi-homing api - interact with replica that is closest to
you
• Regional failovers
Slide 67
Slide 67 text
HA - behind the scenes
• Partitions are regionally redundant
• Within region every partion is replicated
• Replicas have 10 - 20 fault domains
4 replicas X
REGIONS
Slide 68
Slide 68 text
Demo time!
Slide 69
Slide 69 text
Cosmos DB - limitations
• Documents size is max 2MB
• Only 5 geospatial functions
• Remember - it's a document based DB
• Consistency levels - choose and handle
Slide 70
Slide 70 text
Amazon DynamoDB
• key-value, document based DB
• multiregion
• multimaster
Slide 71
Slide 71 text
Storage
Slide 72
Slide 72 text
Storage
• Distributed resources, assets
• Share resources by HTTPS
• Backups, logs
• Long living resources
Slide 73
Slide 73 text
Locally-redundant storage
3x within datacentre
Slide 74
Slide 74 text
Geo-redundant storage
3x primary region, then secondary region
Slide 75
Slide 75 text
Read-access GRS
You can read from secondary region ;)
Slide 76
Slide 76 text
Azure Storage V2
Slide 77
Slide 77 text
Azure Storage V2
• Zone-redundant storage (ZRS) - 3 clusters in different AZ
• Geo-zone-redundant storage (GZRS) - ZRS + secondary
region
• Still, reading requires RA-GZRS ;)
Slide 78
Slide 78 text
Azure queue
• Integrate different parts of sytem or different systems
• Infinite time to live (ttl)
• 500 TB
• Order is not guaranted (Service bus message session has
it)
Slide 79
Slide 79 text
Azure queue
• Unlimited number of concurrent clients
Slide 80
Slide 80 text
And suddenly
• 503 Server busy ...
Slide 81
Slide 81 text
AWS
• S3
• SQS
• More mature Java SDKs
Slide 82
Slide 82 text
Storage - experience
• Use latest APIs, SDKs (Java ones are not so good as .NET
ecosystem, but they improve!)
• Take care of storage life cycle
• Be ready for migration
Slide 83
Slide 83 text
Proxy
Slide 84
Slide 84 text
Dynamic approach
Slide 85
Slide 85 text
SLA - Service Level
Agreement
Slide 86
Slide 86 text
SLA
“An SLA is a contractual agreement between a service
provider and a customer buying a service.”
What Are The Chances An Availability SLA Will Be Violated? [1]
Slide 87
Slide 87 text
SLA
“The agreement stipulates some minimum Quality of
Service (QOS) requirement.”
What Are The Chances An Availability SLA Will Be Violated? [1]
Slide 88
Slide 88 text
SLA
• How to understand it?
Slide 89
Slide 89 text
SLA Example
Let's assume that we need both services at the same time:
• DB 99.99%
• Storage 99.99%
What's our SLA?
• Use uptime approach
Slide 90
Slide 90 text
Probability
When two events, A and B, are independent,
the probability of both occurring is:
P(A and B) = P(A) * P(B)
MathGoodies.com
Slide 91
Slide 91 text
Uptime approach
DB↑ - DB is up S↑ - Storage is up
P(DB↑) = 0.9999 P(S↑) = 0.9999
P(DB↑) * P(S↑) = 0.9999 * 0.9999
Slide 92
Slide 92 text
Uptime approach
99.980001%
Slide 93
Slide 93 text
So let's go multiregion!
Slide 94
Slide 94 text
Multiregion SLA
Traffic Manager 99.99%
Europe↑ - up US↑ - up
Europe↓ - down US↓ - down
P(Europe↑ or US↑) = 1 - P(Europe↓and US↓)
Slide 95
Slide 95 text
Multiregion SLA
Traffic Manager 99.99%
Europe↑ - up US↑ - up
Europe↓ - down US↓ - down
P(Europe↑ or US↑) = 1 - P(Europe↓) * P(US↓)
Multiregion SLA
TM↑ - Traffic Manager is up
TM= 99.99%
P(TM↑) * P(Europe↑ or US↑) = 0.9999 * 0.99999996
Slide 100
Slide 100 text
Multiregion SLA
99.989996%
Slide 101
Slide 101 text
Complexity is deceptively
simple
Slide 102
Slide 102 text
No content
Slide 103
Slide 103 text
No content
Slide 104
Slide 104 text
No content
Slide 105
Slide 105 text
No content
Slide 106
Slide 106 text
No content
Slide 107
Slide 107 text
No content
Slide 108
Slide 108 text
Call to action!
1. Calculate SLA for you system.
Slide 109
Slide 109 text
Call to action!
1. Calculate SLA for you system.
2. Play with HA features and be ready for failure.
Slide 110
Slide 110 text
Call to action!
1. Calculate SLA for you system.
2. Play with HA features and be ready for failure.
3. Check if you using cloud geo infrastructre in a
way that it fits your HA needs.