Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From vertical to horizontal: the challenges of scalability in the cloud

From vertical to horizontal: the challenges of scalability in the cloud


Pierre-Yves Ritschard

October 24, 2013



    CLOUD @pyr
  2. SHORT BIO Pierre-Yves Ritschard CTO @ exoscale - The leading

    swiss public cloud provider Open Source Developer - riemann, collectd, pallet, openbsd Architect of several cloud platforms - paper.li Recovering Operations Engineer
  3. SCALABILITY « The ability of a system, network, or process

    to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. »
  4. SCALABILITY Culture Organization & Process Technical Architecture Operations

  5. SCALABILITY Technical Architecture Operations Culture Organization & Process

  6. Scaling geometry Recent history Enter the cloud Distributed Headaches Architecture

    Drivers Looking forward
  7. QUICK NOTES «cloud» an umbrella term I will conflate cloud

    and public IAAS Oriented towards web application design Applicable to all sorts of applications
  8. SCALING GEOMETRY Vertical, Horizontal and Diagonal

  9. Vertical (scaling up) Adding resources to a single node in

    the system.
  10. Horizontal (scaling out) Accomodate growth by adding more nodes to

    a system.
  11. Diagonal Most common scaling strategy, first add resources, then distribute

    workload across nodes.
  12. RECENT HISTORY Leading up to IAAS

  13. None
  14. Wherever possible, a great approach

  15. Why stop ?

  16. None
  17. MOORE'S LAW « Over the history of computing, the number

    of transistors on integrated circuits doubles approximately every two years. »
  18. None
  19. Average core speed has been stable for several years Consistent

    increase in cores per node
  20. Vertical scaling challenges 424 pages

  21. Vertical scaling challenges Threads ?

  22. Meanwhile...


  24. IT as an utility

  25. Programmable provisionning and decommisioning

  26. Flexible node sizes (CPU, RAM, Disk)

  27. Pay as you go model

  28. UPSIDE

  29. Much lower capacity planning overhead

  30. OPEX makes billing dept. happy

  31. Nobody likes to change disks or rack servers

  32. switches? gone. vlans? gone. ip allocation and translation? gone. OS

    partitioning? gone. OS raid management? gone.
  33. None
  34. ( n o d e - s p e c

    : n e t w o r k { : i n b o u n d - p o r t s [ 2 2 , 8 0 , 4 4 3 ] } , : i m a g e { : o s - f a m i l y : u b u n t u , : o s - v e r s i o n - m a t c h e s " 1 2 . 0 4 " } , : h a r d w a r e { : m i n - c o r e s 1 , : m i n - d i s k 1 0 , : m i n - r a m 5 1 2 } )

  36. It's hard to break out of the big iron mental

  37. It's hard to change our trust model I want to

    be able to see my servers
  38. There's still an upper limit on single node size It

    usually is lower than what you had in-house
  39. Beware the...


  41. Two nodes interacting imply a distributed system Reduces SPOF, increases

    amount of failure scenarios
  42. Distributed systems are subject to the CAP / Brewer Theorem.

    You cannot enjoy all three of: Consistency, Availability, Partition Tolerance
  43. Consistency Simultaneous requests see a consistent set of data.

  44. Availability Each incoming request is acknowledged and receives a success

    or failure response.
  45. Partition Tolerance The system will continue to process incoming requests

    in the face of node failures.
  46. ARCHITECTURE DRIVERS Eliminating complexity to focus on higher order problems.

  47. Composable, Inspectable services Queues over RPC Degrade gracefully Prefer concerned

    citizens Configuration from a service registry Nodes as immutable data structures

  49. Service oriented approach Simplicity through decoupling

  50. No internal semantics on the wire Remember CORBA, RMI?

  51. Loose contracts across service boundaries liberal in what you accept,

    conservative in what you send
  52. Transfer data, not state JSON has flaws, but lingua-franca


  54. Build inspection in services from the start Number of ack,

    processed, failed requests. Time actions to quickly identify hot spots.
  55. Rely on unobstrusive inspection UDP often sufficient.

  56. Leverage proven existing tools statsd, riemann, yammer-metrics, JMX

  57. @ w r a p _ r i e m

    a n n ( " a c t i v a t e - a c c o u n t " ) d e f a c t i v a t e _ a c c o u n t ( u u i d , a c c o u n t _ t y p e , s l a = N o n e ) : i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ A N O N Y M O U S : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s h i m _ a c c o u n t ( ) e l i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ S T A N D A R D : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s l a _ a c c o u n t ( u u i d , s l a ) r e t u r n a c t i v a t i o n _ s t a t u s

  59. RPC couples systems Your service's CAP properties are tied to

    the RPC provider
  60. Take responsibility out of callee as soon as possible Textbook

    example: SMTP
  61. Queues promote stateless-ness { r e q u e s

    t _ i d : " 1 8 3 3 0 8 a 1 - 0 7 e 6 - 4 0 1 f - 9 8 3 f - d c d c d 4 2 1 7 a e 7 " , s o u r c e _ h o s t : " 3 c 5 7 6 7 7 3 - 8 7 0 d - 4 3 f a - b f d e - 7 9 2 f 7 1 f f 6 5 3 2 " , a c t i o n : " m a i l o u t " , r e c i p i e n t s : [ " f o o @ e x a m p l e . c o m " , " b a r @ e x a m p l e . c o m " ] , c o n t e n t : " . . . " }
  62. Queues help shape the system dynamically Queue backlog growing ?

    Spin new workers up!

  64. Embrace failure because systems will fail, in ways you didn't

  65. Avoid failure propagation Implement back pressure to avoid killing loaded

    systems. Queues make great pressure valves.
  66. Don't give up Use connection pooling and retry policies. Best

    in class: finagle, cassandra-driver
  67. Keep systems on SQL down ? No more account creation,

    still serving existing customers.

  69. All moving parts in your distributed system force compromises This

    is true of your components and external ones
  70. Choose components accordingly

  71. You probably want an AP queueing system So please avoid

    using MySQL as one! Candidates: Apache Kafka, RabbitMQ, Kestrel. (redis, to a lesser extent)
  72. Cache locally Much higher aggregated cache capacity No huge SPOF

  73. Choose your storage compromises Object Storage, Distributed KV (eventual consistency),

    SQL (no P or A)

  75. Keep track of node volatility Reprovisionning of configuration on cluster

    topology changes Load-balancers make a great interaction point (concentrate changes there).
  76. The service registry is critical Ideally needs to be a

    distributed transational system You already have an AP one: DNS!
  77. Zookeeper as a service registry Current best in class. Promotes

    usage in-app as well for distributed locks, barriers, etc.
  78. p u b l i c c l a s

    s C l u s t e r C l i e n t e x t e n d s L e a d e r S e l e c t o r L i s t e n e r A d a p t e r { p r i v a t e f i n a l L e a d e r S e l e c t o r l e a d e r S e l e c t o r ; p u b l i c C l u s t e r C l i e n t ( C u r a t o r F r a m e w o r k c l i e n t , S t r i n g p a t h ) { l e a d e r S e l e c t o r = n e w L e a d e r S e l e c t o r ( c l i e n t , p a t h , t h i s ) ; l e a d e r S e l e c t o r . a u t o R e q u e u e ( ) ; } @ O v e r r i d e p u b l i c v o i d t a k e L e a d e r s h i p ( C u r a t o r F r a m e w o r k c l i e n t ) t h r o w s E x c e p t i o n { / / s c h e d u l e a c t i o n s a s l e a d e r } }

  80. No more fixing nodes human intervention means configuration drift

  81. Configuration drift ? reprovision node New version of software ?

    reprovision node Configuration file change ? reprovision node
  82. «Cook» images as part of your CI

  83. Depart from using the node as the base unit of

    reasoning All nodes in a cluster should be equivalent
  84. LOOKING FORWARD the cluster is the computer

  85. Node cooking DSLs packer, veewee, vagrant

  86. Old is new containers are gaining traction (docker, lxc, zerovm).

  87. Generic platform abstractions PAAS solutions as a commodity Generic scheduling

    and failover frameworks: Mesos
  88. THANK YOU ! Questions ? github pyr twitter @pyr ask

    me for a CHF50 voucher on exoscale!