From vertical to horizontal: the challenges of scalability in the cloud

Slide 1

Slide 1 text

FROM VERTICAL TO HORIZONTAL THE CHALLENGES OF SCALABILITY IN THE CLOUD @pyr

Slide 2

Slide 2 text

SHORT BIO Pierre-Yves Ritschard CTO @ exoscale - The leading swiss public cloud provider Open Source Developer - riemann, collectd, pallet, openbsd Architect of several cloud platforms - paper.li Recovering Operations Engineer

Slide 3

Slide 3 text

SCALABILITY « The ability of a system, network, or process to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. »

Slide 4

Slide 4 text

SCALABILITY Culture Organization & Process Technical Architecture Operations

Slide 5

Slide 5 text

SCALABILITY Technical Architecture Operations Culture Organization & Process

Slide 6

Slide 6 text

Scaling geometry Recent history Enter the cloud Distributed Headaches Architecture Drivers Looking forward

Slide 7

Slide 7 text

QUICK NOTES «cloud» an umbrella term I will conflate cloud and public IAAS Oriented towards web application design Applicable to all sorts of applications

Slide 8

Slide 8 text

SCALING GEOMETRY Vertical, Horizontal and Diagonal

Slide 9

Slide 9 text

Vertical (scaling up) Adding resources to a single node in the system.

Slide 10

Slide 10 text

Horizontal (scaling out) Accomodate growth by adding more nodes to a system.

Slide 11

Slide 11 text

Diagonal Most common scaling strategy, first add resources, then distribute workload across nodes.

Slide 12

Slide 12 text

RECENT HISTORY Leading up to IAAS

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

Wherever possible, a great approach

Slide 15

Slide 15 text

Why stop ?

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

MOORE'S LAW « Over the history of computing, the number of transistors on integrated circuits doubles approximately every two years. »

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Average core speed has been stable for several years Consistent increase in cores per node

Slide 20

Slide 20 text

Vertical scaling challenges 424 pages

Slide 21

Slide 21 text

Vertical scaling challenges Threads ?

Slide 22

Slide 22 text

Meanwhile...

Slide 23

Slide 23 text

ENTER, THE CLOUD

Slide 24

Slide 24 text

IT as an utility

Slide 25

Slide 25 text

Programmable provisionning and decommisioning

Slide 26

Slide 26 text

Flexible node sizes (CPU, RAM, Disk)

Slide 27

Slide 27 text

Pay as you go model

Slide 28

Slide 28 text

UPSIDE

Slide 29

Slide 29 text

Much lower capacity planning overhead

Slide 30

Slide 30 text

OPEX makes billing dept. happy

Slide 31

Slide 31 text

Nobody likes to change disks or rack servers

Slide 32

Slide 32 text

switches? gone. vlans? gone. ip allocation and translation? gone. OS partitioning? gone. OS raid management? gone.

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

( n o d e - s p e c : n e t w o r k { : i n b o u n d - p o r t s [ 2 2 , 8 0 , 4 4 3 ] } , : i m a g e { : o s - f a m i l y : u b u n t u , : o s - v e r s i o n - m a t c h e s " 1 2 . 0 4 " } , : h a r d w a r e { : m i n - c o r e s 1 , : m i n - d i s k 1 0 , : m i n - r a m 5 1 2 } )

Slide 35

Slide 35 text

DOWNSIDES

Slide 36

Slide 36 text

It's hard to break out of the big iron mental model

Slide 37

Slide 37 text

It's hard to change our trust model I want to be able to see my servers

Slide 38

Slide 38 text

There's still an upper limit on single node size It usually is lower than what you had in-house

Slide 39

Slide 39 text

Beware the...

Slide 40

Slide 40 text

DISTRIBUTED HEADACHES

Slide 41

Slide 41 text

Two nodes interacting imply a distributed system Reduces SPOF, increases amount of failure scenarios

Slide 42

Slide 42 text

Distributed systems are subject to the CAP / Brewer Theorem. You cannot enjoy all three of: Consistency, Availability, Partition Tolerance

Slide 43

Slide 43 text

Consistency Simultaneous requests see a consistent set of data.

Slide 44

Slide 44 text

Availability Each incoming request is acknowledged and receives a success or failure response.

Slide 45

Slide 45 text

Partition Tolerance The system will continue to process incoming requests in the face of node failures.

Slide 46

Slide 46 text

ARCHITECTURE DRIVERS Eliminating complexity to focus on higher order problems.

Slide 47

Slide 47 text

Composable, Inspectable services Queues over RPC Degrade gracefully Prefer concerned citizens Configuration from a service registry Nodes as immutable data structures

Slide 48

Slide 48 text

COMPOSABLE SERVICES

Slide 49

Slide 49 text

Service oriented approach Simplicity through decoupling

Slide 50

Slide 50 text

No internal semantics on the wire Remember CORBA, RMI?

Slide 51

Slide 51 text

Loose contracts across service boundaries liberal in what you accept, conservative in what you send

Slide 52

Slide 52 text

Transfer data, not state JSON has flaws, but lingua-franca

Slide 53

Slide 53 text

INSPECTABLE SERVICES

Slide 54

Slide 54 text

Build inspection in services from the start Number of ack, processed, failed requests. Time actions to quickly identify hot spots.

Slide 55

Slide 55 text

Rely on unobstrusive inspection UDP often sufficient.

Slide 56

Slide 56 text

Leverage proven existing tools statsd, riemann, yammer-metrics, JMX

Slide 57

Slide 57 text

@ w r a p _ r i e m a n n ( " a c t i v a t e - a c c o u n t " ) d e f a c t i v a t e _ a c c o u n t ( u u i d , a c c o u n t _ t y p e , s l a = N o n e ) : i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ A N O N Y M O U S : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s h i m _ a c c o u n t ( ) e l i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ S T A N D A R D : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s l a _ a c c o u n t ( u u i d , s l a ) r e t u r n a c t i v a t i o n _ s t a t u s

Slide 58

Slide 58 text

QUEUES OVER RPC

Slide 59

Slide 59 text

RPC couples systems Your service's CAP properties are tied to the RPC provider

Slide 60

Slide 60 text

Take responsibility out of callee as soon as possible Textbook example: SMTP

Slide 61

Slide 61 text

Queues promote stateless-ness { r e q u e s t _ i d : " 1 8 3 3 0 8 a 1 - 0 7 e 6 - 4 0 1 f - 9 8 3 f - d c d c d 4 2 1 7 a e 7 " , s o u r c e _ h o s t : " 3 c 5 7 6 7 7 3 - 8 7 0 d - 4 3 f a - b f d e - 7 9 2 f 7 1 f f 6 5 3 2 " , a c t i o n : " m a i l o u t " , r e c i p i e n t s : [ " f o o @ e x a m p l e . c o m " , " b a r @ e x a m p l e . c o m " ] , c o n t e n t : " . . . " }

Slide 62

Slide 62 text

Queues help shape the system dynamically Queue backlog growing ? Spin new workers up!

Slide 63

Slide 63 text

DEGRADE GRACEFULLY

Slide 64

Slide 64 text

Embrace failure because systems will fail, in ways you didn't expect

Slide 65

Slide 65 text

Avoid failure propagation Implement back pressure to avoid killing loaded systems. Queues make great pressure valves.

Slide 66

Slide 66 text

Don't give up Use connection pooling and retry policies. Best in class: finagle, cassandra-driver

Slide 67

Slide 67 text

Keep systems on SQL down ? No more account creation, still serving existing customers.

Slide 68

Slide 68 text

PREFER CONCERNED CITIZENS

Slide 69

Slide 69 text

All moving parts in your distributed system force compromises This is true of your components and external ones

Slide 70

Slide 70 text

Choose components accordingly

Slide 71

Slide 71 text

You probably want an AP queueing system So please avoid using MySQL as one! Candidates: Apache Kafka, RabbitMQ, Kestrel. (redis, to a lesser extent)

Slide 72

Slide 72 text

Cache locally Much higher aggregated cache capacity No huge SPOF

Slide 73

Slide 73 text

Choose your storage compromises Object Storage, Distributed KV (eventual consistency), SQL (no P or A)

Slide 74

Slide 74 text

CONFIGURATION THROUGH SERVICE REGISTRIES

Slide 75

Slide 75 text

Keep track of node volatility Reprovisionning of configuration on cluster topology changes Load-balancers make a great interaction point (concentrate changes there).

Slide 76

Slide 76 text

The service registry is critical Ideally needs to be a distributed transational system You already have an AP one: DNS!

Slide 77

Slide 77 text

Zookeeper as a service registry Current best in class. Promotes usage in-app as well for distributed locks, barriers, etc.

Slide 78

Slide 78 text

p u b l i c c l a s s C l u s t e r C l i e n t e x t e n d s L e a d e r S e l e c t o r L i s t e n e r A d a p t e r { p r i v a t e f i n a l L e a d e r S e l e c t o r l e a d e r S e l e c t o r ; p u b l i c C l u s t e r C l i e n t ( C u r a t o r F r a m e w o r k c l i e n t , S t r i n g p a t h ) { l e a d e r S e l e c t o r = n e w L e a d e r S e l e c t o r ( c l i e n t , p a t h , t h i s ) ; l e a d e r S e l e c t o r . a u t o R e q u e u e ( ) ; } @ O v e r r i d e p u b l i c v o i d t a k e L e a d e r s h i p ( C u r a t o r F r a m e w o r k c l i e n t ) t h r o w s E x c e p t i o n { / / s c h e d u l e a c t i o n s a s l e a d e r } }

Slide 79

Slide 79 text

IMMUTABLE INFRASTRUCTURE

Slide 80

Slide 80 text

No more fixing nodes human intervention means configuration drift

Slide 81

Slide 81 text

Configuration drift ? reprovision node New version of software ? reprovision node Configuration file change ? reprovision node

Slide 82

Slide 82 text

«Cook» images as part of your CI

Slide 83

Slide 83 text

Depart from using the node as the base unit of reasoning All nodes in a cluster should be equivalent

Slide 84

Slide 84 text

LOOKING FORWARD the cluster is the computer

Slide 85

Slide 85 text

Node cooking DSLs packer, veewee, vagrant

Slide 86

Slide 86 text

Old is new containers are gaining traction (docker, lxc, zerovm).

Slide 87

Slide 87 text

Generic platform abstractions PAAS solutions as a commodity Generic scheduling and failover frameworks: Mesos

Slide 88

Slide 88 text

THANK YOU ! Questions ? github pyr twitter @pyr ask me for a CHF50 voucher on exoscale!