Slide 1

Slide 1 text

Do Developers Dream of Stateless Apps? NDC Oslo 2020, June 11th by Łukasz Gebel @rauluka7

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

Do We Dream of Stateless Apps?

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

State “A condition or way of being that exists at a particular time.” Cambridge Dictionary

Slide 12

Slide 12 text

State - computer science “(...) a program is described as stateful if it is designed to remember preceding events or user interactions.” Wikipedia

Slide 13

Slide 13 text

State - computer science “The remembered information is called the state of the system.” Wikipedia

Slide 14

Slide 14 text

State, state everywhere

Slide 15

Slide 15 text

State, state everywhere

Slide 16

Slide 16 text

State, state everywhere

Slide 17

Slide 17 text

State, state everywhere

Slide 18

Slide 18 text

Stateful vs Stateless

Slide 19

Slide 19 text

Fully Stateless

Slide 20

Slide 20 text

Stateful or Stateless?

Slide 21

Slide 21 text

Fully Stateful

Slide 22

Slide 22 text

Awfully Stateful

Slide 23

Slide 23 text

High availability - strategies

Slide 24

Slide 24 text

Scaling • Vertical • Horizontal

Slide 25

Slide 25 text

Multi-region

Slide 26

Slide 26 text

Multi-region - basic units

Slide 27

Slide 27 text

Multi-region Azure vs AWS Azure • 60 Regions...

Slide 28

Slide 28 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;)

Slide 29

Slide 29 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;) • 10 regions with Availability Zones

Slide 30

Slide 30 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region

Slide 31

Slide 31 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions

Slide 32

Slide 32 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones

Slide 33

Slide 33 text

Multi-region Azure vs AWS Azure • 60 Regions... no Azure you have 53 ;) • 10 regions with Availability Zones • Minimum 3xAZ per region AWS • 23 Regions • 70 Availability Zones • 2 or more AZ (except Osaka region)

Slide 34

Slide 34 text

VM Placement Strategies • Fault domains • Update domains

Slide 35

Slide 35 text

VM Placement Strategies

Slide 36

Slide 36 text

Why should we care? • Distributed systems need coordination: • distrubute configuration • synchronize state • 2n + 1 machines • Minimal HA setup = 3 instances

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

I DON'T WANT TO LIVE ON THIS PLANET ANYMORE...

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

So let's scale Zookeepers! N = 2 2 N + 1 = 5

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

NOOOOOOOOOOOO!!!!!!!

Slide 44

Slide 44 text

Highly Available Cloud Services

Slide 45

Slide 45 text

Database

Slide 46

Slide 46 text

Azure SQL • DB as a managed service • Microsoft SQL Server Database Engine • Scalability & High availability features

Slide 47

Slide 47 text

Scalability Dynamic scaling • not equal to autoscaling! • manual process

Slide 48

Slide 48 text

Scalability • Automatic scaling • Use elastic pool - databases share assigned resources • Implement custom solution based on DB metrics

Slide 49

Slide 49 text

High Availability Features • Active Geo-Replication • Failover Groups

Slide 50

Slide 50 text

Active Geo-Replication Primary Database

Slide 51

Slide 51 text

Active Geo-Replication Secondary Database in the same or different region

Slide 52

Slide 52 text

Active Geo-Replication Asynchronous data replication

Slide 53

Slide 53 text

Active Geo-Replication Manual failover to secondary database

Slide 54

Slide 54 text

Failover groups Secondary DB by default in other region West Europe North Europe

Slide 55

Slide 55 text

Failover groups Automatic failover West Europe North Europe

Slide 56

Slide 56 text

Failover groups Single connection string directing to current primary db West Europe North Europe YourPrimaryDB.com

Slide 57

Slide 57 text

Paired regions • Physical isolation - if possible at least 300 miles • Region recovery order - if multiple regions fail one of each pair is prioritized for recovery • Sequential updates - minimize impact of bugs or breaking changes West Europe North Europe

Slide 58

Slide 58 text

Demo time!

Slide 59

Slide 59 text

Azure SQL - my experience

Slide 60

Slide 60 text

But there is a hope... Azure SQL gained new features lately: • Business Critical Tier • Zone Redundant deployments

Slide 61

Slide 61 text

Amazon RDS • Amazon Aurora, MySQL, PostreSQL ... • Multi AZ deployment • Synchronous replication to Stand-by instance in different AZ

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

Cosmos DB

Slide 64

Slide 64 text

Cosmos DB • Multi-model DB • API: SQL, MongoDB, Cassandra, Gremlin and more • Document based, table -row, graph, key value

Slide 65

Slide 65 text

High Availability Features • Single master - multiple readers replication • Multi master replication • Add and remove regions on the go

Slide 66

Slide 66 text

Cosmos DB - High availability • Multi-homing api - interact with replica that is closest to you • Regional failovers

Slide 67

Slide 67 text

HA - behind the scenes • Partitions are regionally redundant • Within region every partion is replicated • Replicas have 10 - 20 fault domains 4 replicas X REGIONS

Slide 68

Slide 68 text

Demo time!

Slide 69

Slide 69 text

Cosmos DB - limitations • Documents size is max 2MB • Only 5 geospatial functions • Remember - it's a document based DB • Consistency levels - choose and handle

Slide 70

Slide 70 text

Amazon DynamoDB • key-value, document based DB • multiregion • multimaster

Slide 71

Slide 71 text

Storage

Slide 72

Slide 72 text

Storage • Distributed resources, assets • Share resources by HTTPS • Backups, logs • Long living resources

Slide 73

Slide 73 text

Locally-redundant storage 3x within datacentre

Slide 74

Slide 74 text

Geo-redundant storage 3x primary region, then secondary region

Slide 75

Slide 75 text

Read-access GRS You can read from secondary region ;)

Slide 76

Slide 76 text

Azure Storage V2

Slide 77

Slide 77 text

Azure Storage V2 • Zone-redundant storage (ZRS) - 3 clusters in different AZ • Geo-zone-redundant storage (GZRS) - ZRS + secondary region • Still, reading requires RA-GZRS ;)

Slide 78

Slide 78 text

Azure queue • Integrate different parts of sytem or different systems • Infinite time to live (ttl) • 500 TB • Order is not guaranted (Service bus message session has it)

Slide 79

Slide 79 text

Azure queue • Unlimited number of concurrent clients

Slide 80

Slide 80 text

And suddenly • 503 Server busy ...

Slide 81

Slide 81 text

AWS • S3 • SQS • More mature Java SDKs

Slide 82

Slide 82 text

Storage - experience • Use latest APIs, SDKs (Java ones are not so good as .NET ecosystem, but they improve!) • Take care of storage life cycle • Be ready for migration

Slide 83

Slide 83 text

Proxy

Slide 84

Slide 84 text

Dynamic approach

Slide 85

Slide 85 text

SLA - Service Level Agreement

Slide 86

Slide 86 text

SLA “An SLA is a contractual agreement between a service provider and a customer buying a service.” What Are The Chances An Availability SLA Will Be Violated? [1]

Slide 87

Slide 87 text

SLA “The agreement stipulates some minimum Quality of Service (QOS) requirement.” What Are The Chances An Availability SLA Will Be Violated? [1]

Slide 88

Slide 88 text

SLA • How to understand it?

Slide 89

Slide 89 text

SLA Example Let's assume that we need both services at the same time: • DB 99.99% • Storage 99.99% What's our SLA? • Use uptime approach

Slide 90

Slide 90 text

Probability When two events, A and B, are independent, the probability of both occurring is: P(A and B) = P(A) * P(B) MathGoodies.com

Slide 91

Slide 91 text

Uptime approach DB↑ - DB is up S↑ - Storage is up P(DB↑) = 0.9999 P(S↑) = 0.9999 P(DB↑) * P(S↑) = 0.9999 * 0.9999

Slide 92

Slide 92 text

Uptime approach 99.980001%

Slide 93

Slide 93 text

So let's go multiregion!

Slide 94

Slide 94 text

Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ - up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓and US↓)

Slide 95

Slide 95 text

Multiregion SLA Traffic Manager 99.99% Europe↑ - up US↑ - up Europe↓ - down US↓ - down P(Europe↑ or US↑) = 1 - P(Europe↓) * P(US↓)

Slide 96

Slide 96 text

Multiregion SLA P(Europe↓) = P(US↓) = 1 - 99.980001 = 0.00019999

Slide 97

Slide 97 text

Multiregion SLA P(Europe↑ or US↑) = 1 - 0.00019999 * 0.00019999

Slide 98

Slide 98 text

Multiregion SLA 99.999996%

Slide 99

Slide 99 text

Multiregion SLA TM↑ - Traffic Manager is up TM= 99.99% P(TM↑) * P(Europe↑ or US↑) = 0.9999 * 0.99999996

Slide 100

Slide 100 text

Multiregion SLA 99.989996%

Slide 101

Slide 101 text

Complexity is deceptively simple

Slide 102

Slide 102 text

No content

Slide 103

Slide 103 text

No content

Slide 104

Slide 104 text

No content

Slide 105

Slide 105 text

No content

Slide 106

Slide 106 text

No content

Slide 107

Slide 107 text

No content

Slide 108

Slide 108 text

Call to action! 1. Calculate SLA for you system.

Slide 109

Slide 109 text

Call to action! 1. Calculate SLA for you system. 2. Play with HA features and be ready for failure.

Slide 110

Slide 110 text

Call to action! 1. Calculate SLA for you system. 2. Play with HA features and be ready for failure. 3. Check if you using cloud geo infrastructre in a way that it fits your HA needs.

Slide 111

Slide 111 text

Do We Dream of Stateless Apps?

Slide 112

Slide 112 text

No content

Slide 113

Slide 113 text

Q & A

Slide 114

Slide 114 text

Bibliography • https://nofluffjuststuff.com/magazine/2017/10/cloud_native_apps_must_be_stateless_myth_or_ fact_ • https://www.researchgate.net/publication/221056071_What_Are_the_Chances_an_Availability_S LA_will_be_Violated • https://vincentlauzon.com/2018/01/22/solution-slas-in-azure/ • http://tuxlabs.com/?p=267 • https://www.bizety.com/2018/08/21/stateful-vs-stateless-architecture-overview/ • https://www.xenonstack.com/insights/stateful-and-stateless-applications/ • https://docs.microsoft.com/en-us/azure/architecture/aws-professional/ • https://cloudacademy.com/blog/aws-regions-and-availability-zones-the-simplest-explanation- you-will-ever-find-around/

Slide 115

Slide 115 text

Bibliography • https://aws.amazon.com/about-aws/global-infrastructure/regions_az/ • https://docs.aws.amazon.com/en_pv/AmazonRDS/latest/UserGuide/Concepts.RegionsAndAvailabilityZone s.html • https://docs.microsoft.com/en-us/azure/best-practices-availability-paired-regions • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-disaster-recovery-strategies-for- applications-with-elastic-pool • https://azure.microsoft.com/en-us/blog/azure-sql-databases-disaster-recovery-101/ • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability • https://docs.microsoft.com/en-us/azure/sql-database/sql-database-auto-failover-group Graphics attributions: • https://www.behance.net/gallery/64391071/Tinkoff-systems?tracking_source=search%7Ccyber%20map • https://www.studiodaily.com/2018/01/designing-retro-tech-look-future-blade-runner-2049/ • https://www.behance.net/gallery/42879183/URBS?tracking_source=search%7Cfuturistic%20warehouse

Slide 116

Slide 116 text

Slides and code • Code: https://github.com/rauluka/state-runner-code • Slides: https://speakerdeck.com/rauluka7/do-developers-dream-of- stateless-apps-d9c72762-6250-433d-b550-8b511d6440ca