Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From vertical to horizontal: the challenges of scalability in the cloud

From vertical to horizontal: the challenges of scalability in the cloud

Pierre-Yves Ritschard

October 24, 2013
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Technology

Transcript

  1. SHORT BIO Pierre-Yves Ritschard CTO @ exoscale - The leading

    swiss public cloud provider Open Source Developer - riemann, collectd, pallet, openbsd Architect of several cloud platforms - paper.li Recovering Operations Engineer
  2. SCALABILITY « The ability of a system, network, or process

    to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. »
  3. QUICK NOTES «cloud» an umbrella term I will conflate cloud

    and public IAAS Oriented towards web application design Applicable to all sorts of applications
  4. MOORE'S LAW « Over the history of computing, the number

    of transistors on integrated circuits doubles approximately every two years. »
  5. switches? gone. vlans? gone. ip allocation and translation? gone. OS

    partitioning? gone. OS raid management? gone.
  6. ( n o d e - s p e c

    : n e t w o r k { : i n b o u n d - p o r t s [ 2 2 , 8 0 , 4 4 3 ] } , : i m a g e { : o s - f a m i l y : u b u n t u , : o s - v e r s i o n - m a t c h e s " 1 2 . 0 4 " } , : h a r d w a r e { : m i n - c o r e s 1 , : m i n - d i s k 1 0 , : m i n - r a m 5 1 2 } )
  7. There's still an upper limit on single node size It

    usually is lower than what you had in-house
  8. Distributed systems are subject to the CAP / Brewer Theorem.

    You cannot enjoy all three of: Consistency, Availability, Partition Tolerance
  9. Composable, Inspectable services Queues over RPC Degrade gracefully Prefer concerned

    citizens Configuration from a service registry Nodes as immutable data structures
  10. Build inspection in services from the start Number of ack,

    processed, failed requests. Time actions to quickly identify hot spots.
  11. @ w r a p _ r i e m

    a n n ( " a c t i v a t e - a c c o u n t " ) d e f a c t i v a t e _ a c c o u n t ( u u i d , a c c o u n t _ t y p e , s l a = N o n e ) : i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ A N O N Y M O U S : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s h i m _ a c c o u n t ( ) e l i f a c c o u n t _ t y p e = = A C C O U N T _ T Y P E _ S T A N D A R D : a c t i v a t i o n _ s t a t u s = a c t i v a t e _ s l a _ a c c o u n t ( u u i d , s l a ) r e t u r n a c t i v a t i o n _ s t a t u s
  12. Queues promote stateless-ness { r e q u e s

    t _ i d : " 1 8 3 3 0 8 a 1 - 0 7 e 6 - 4 0 1 f - 9 8 3 f - d c d c d 4 2 1 7 a e 7 " , s o u r c e _ h o s t : " 3 c 5 7 6 7 7 3 - 8 7 0 d - 4 3 f a - b f d e - 7 9 2 f 7 1 f f 6 5 3 2 " , a c t i o n : " m a i l o u t " , r e c i p i e n t s : [ " f o o @ e x a m p l e . c o m " , " b a r @ e x a m p l e . c o m " ] , c o n t e n t : " . . . " }
  13. Keep systems on SQL down ? No more account creation,

    still serving existing customers.
  14. All moving parts in your distributed system force compromises This

    is true of your components and external ones
  15. You probably want an AP queueing system So please avoid

    using MySQL as one! Candidates: Apache Kafka, RabbitMQ, Kestrel. (redis, to a lesser extent)
  16. Keep track of node volatility Reprovisionning of configuration on cluster

    topology changes Load-balancers make a great interaction point (concentrate changes there).
  17. The service registry is critical Ideally needs to be a

    distributed transational system You already have an AP one: DNS!
  18. Zookeeper as a service registry Current best in class. Promotes

    usage in-app as well for distributed locks, barriers, etc.
  19. p u b l i c c l a s

    s C l u s t e r C l i e n t e x t e n d s L e a d e r S e l e c t o r L i s t e n e r A d a p t e r { p r i v a t e f i n a l L e a d e r S e l e c t o r l e a d e r S e l e c t o r ; p u b l i c C l u s t e r C l i e n t ( C u r a t o r F r a m e w o r k c l i e n t , S t r i n g p a t h ) { l e a d e r S e l e c t o r = n e w L e a d e r S e l e c t o r ( c l i e n t , p a t h , t h i s ) ; l e a d e r S e l e c t o r . a u t o R e q u e u e ( ) ; } @ O v e r r i d e p u b l i c v o i d t a k e L e a d e r s h i p ( C u r a t o r F r a m e w o r k c l i e n t ) t h r o w s E x c e p t i o n { / / s c h e d u l e a c t i o n s a s l e a d e r } }
  20. Configuration drift ? reprovision node New version of software ?

    reprovision node Configuration file change ? reprovision node
  21. Depart from using the node as the base unit of

    reasoning All nodes in a cluster should be equivalent
  22. THANK YOU ! Questions ? github pyr twitter @pyr ask

    me for a CHF50 voucher on exoscale!