Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mesomatic: the cluster is a library

Mesomatic: the cluster is a library

Pierre-Yves Ritschard

June 24, 2015
Tweet

More Decks by Pierre-Yves Ritschard

Other Decks in Programming

Transcript

  1. @PYR CTO at Exoscale, Swiss Cloud Hosting. Open source developer:

    pithos, cyanite, riemann, collectd, mesomatic.
  2. AIM OF THIS TALK Discouraging you from building distributed systems

    from scratch. Presenting a simple way to interact with Mesos. Yet another introduction to CAP
  3. IT ALWAYS START WITH A SIMPLE PRODUCT You want to

    change the world by disrupting the job board industry. Standard three-tier, self-contained app. Does not fall into the usual definition of distributed systems.
  4. FIRST SIGNS OF SUCCESS Your single server is not sufficient

    anymore. Database gets its own machines, adding new web servers fixes the issue. Logging becomes a bit harder. You switch to a centralized logging solution.
  5. FEATURES GET ADDED Subscriptions emails. Doing it synchronously is impossible.

    Let's add a worker (and thus a queueing mechanism). You start switching from pets to cattle.
  6. SEED MONEY RUNS OUT Let's try other monetization techniques. Freemium

    model with analytics. Where do I run these batch jobs? You partner with another company to exchange data. They have this weird legacy system and the only client lib is in PHP :-(
  7. AS THE PRODUCT GROWS, SO DOES INFRASTRUCTURE You're now at

    3 jenkins slave. You had to split metrics and monitoring on separate machines. You introduce a command and control solution to perform your regular operations. It's time to use puppet (You're really starting to feel like an ops person now).
  8. LET'S TAKE A STEP BACK You're ticking all the boxes:

    CI, Infrastructure as Code, DevOps.
  9. BUT RESOURCE UTILIZATION IS LOW Most of it is articifial

    (agents on every nodes). But you still have peak induced regular contention.
  10. ADDING NEW SERVICES OR COMPONENTS IS HARD Should your most

    active git repository really be the puppet one? You constantly have to make allocation decisions.
  11. HANDLING FAILURE IS HARD Your monitoring system tells you when

    something breaks. You have to recreate machines manually, update configuration all over the place.
  12. IT'S NOT YOUR JOB How do you get out of

    the business of shuffling configuration and apps around ?
  13. WHAT IT SAYS ON THE BOX Apache Mesos abstracts CPU,

    memory, storage, and other compute resources away from machines (physical or virtual), enabling fault- tolerant and elastic distributed systems to easily be built and run effectively.
  14. MESOS MASTERS Expose an HTTP & Protobuf API. Expose a

    web UI. Main entry-point for mesos Frameworks. Gather an maintain slave availability and capacity information. Highly available and horizontally scalable.
  15. MESOS SLAVES Used to launch tasks Tasks are isolated Expose

    their resources to masters. CPU RAM IPs Volumes Attributes Labels
  16. MESOS SCHEDULERS Interact with the mesos master. Receive offers and

    task statuses. Launch tasks by picking up on provided offers. Responsible for exposing services if need be.
  17. MESOS EXECUTORS Responsible for managing workload on mesos slaves. Task

    status is reported back to schedulers. Optional, mesos slave may directly run namespaced commands or start docker containers.
  18. OFFER BASED ALLOCATION MODEL Scheduler receives a list of offers.

    Decides which ones it wants to pick up and run tasks on. Subsequent updated list of offers will be delivered. Resource offers are decided by mesos' allocator. Puts a lot of responsibility on the scheduler.
  19. MESOS TASKS: ADDITIONAL PROPERTIES May provide a health-check. May ask

    for port forwards. May ask for persistence storage volumes.
  20. MESOMATIC [ [ s p o o t n i

    k / m e s o m a t i c " 0 . 2 2 . 1 - r 0 " ] [ s p o o t n i k / m e s o m a t i c - a s y n c " 0 . 2 2 . 1 - r 0 " ] ]
  21. WHY A LIBRARY Existing frameworks might not fit your constraints.

    Distributing workload from within your application. Many frameworks will share a common set of needs. Great foundation for aaS type products.
  22. COMPONENTS Clojure types for mesos datastructures. Facade for creating executors.

    Facade for creating schedulers. Core.async versions of the facades. Purely functional allocation helper.
  23. CLOJURE TYPES m e s o m a t i

    c . t y p e s Conversion functions to and from clojure data. p b - > d a t a , - > p b , d a t a - > p b .
  24. CLOJURE TYPES ( d a t a - > p

    b ( - > T a s k I D " m y - t a s k " ) ) ; ; = > # o b j e c t [ o r g . a p a c h e . m e s o s . P r o t o s $ T a s k I D . . . ] ( - > p b : T a s k I D { : v a l u e " m y - t a s k " } ) ; ; ; ; = > # o b j e c t [ o r g . a p a c h e . m e s o s . P r o t o s $ T a s k I D . . . ] ( p b - > d a t a ( - > ( P r o t o s $ T a s k I D / n e w B u i l d e r ) ( . s e t V a l u e " m y - t a s k " ) ( . b u i l d ) ) ) ; ; = > # m e s o m a t i c . t y p e s . T a s k I D { : v a l u e " m y - t a s k " }
  25. EXECUTOR AND SCHEDULER FACADES Protocol based Client code reifies a

    protocol to input messages from mesos. Perform side-effect (output) through a driver (S c h e d u l e r D r i v e r or E x e c u t o r D r i v e r ).
  26. EXECUTOR FACADE ( d e f p r o t

    o c o l E x e c u t o r ( d i s c o n n e c t e d [ t h i s d r i v e r ] ) ( e r r o r [ t h i s d r i v e r m e s s a g e ] ) ( f r a m e w o r k - m e s s a g e [ t h i s d r i v e r d a t a ] ) ( k i l l - t a s k [ t h i s d r i v e r t a s k - i d ] ) ( l a u n c h - t a s k [ t h i s d r i v e r t a s k ] ) ( r e g i s t e r e d [ t h i s d r i v e r e x e c u t o r - i n f o f r a m e w o r k - i n f o s l a v e - i n f o ] ) ( r e r e g i s t e r e d [ t h i s d r i v e r s l a v e - i n f o ] ) ( s h u t d o w n [ t h i s d r i v e r ] ) ) ( d e f p r o t o c o l E x e c u t o r D r i v e r ( a b o r t ! [ t h i s ] ) ( j o i n ! [ t h i s ] ) ( r u n - d r i v e r ! [ t h i s ] ) ( s e n d - f r a m e w o r k - m e s s a g e ! [ t h i s d a t a ] ) ( s e n d - s t a t u s - u p d a t e ! [ t h i s s t a t u s ] ) ( s t a r t ! [ t h i s ] ) ( s t o p ! [ t h i s ] ) )
  27. EXECUTOR FACADE: INPUT l a u n c h -

    t a s k : Executor was instructed to launch a task. f r a m e w o r k - m e s s a g e : Executor received a framework message from a scheduler.
  28. EXECUTOR FACADE: OUTPUT s e n d - f r

    a m e w o r k - m e s s a g e ! : Sends a message back to the scheduler. s e n d - s t a t u s - u p d a t e ! : Report task status to the scheduler.
  29. SCHEDULER FACADE ( d e f p r o t

    o c o l S c h e d u l e r ( r e g i s t e r e d [ t h i s d r i v e r f r a m e w o r k - i d m a s t e r - i n f o ] ) ( r e r e g i s t e r e d [ t h i s d r i v e r m a s t e r - i n f o ] ) ( d i s c o n n e c t e d [ t h i s d r i v e r ] ) ( r e s o u r c e - o f f e r s [ t h i s d r i v e r o f f e r s ] ) ( o f f e r - r e s c i n d e d [ t h i s d r i v e r o f f e r - i d ] ) ( s t a t u s - u p d a t e [ t h i s d r i v e r s t a t u s ] ) ( f r a m e w o r k - m e s s a g e [ t h i s d r i v e r e x e c u t o r - i d s l a v e - i d d a t a ] ) ( s l a v e - l o s t [ t h i s d r i v e r s l a v e - i d ] ) ( e x e c u t o r - l o s t [ t h i s d r i v e r e x e c u t o r - i d s l a v e - i d s t a t u s ] ) ( e r r o r [ t h i s d r i v e r m e s s a g e ] ) )
  30. SCHEDULER DRIVER ( d e f p r o t

    o c o l S c h e d u l e r D r i v e r ( a b o r t ! [ t h i s ] ) ( d e c l i n e - o f f e r [ t h i s o f f e r - i d ] [ t h i s o f f e r - i d f i l t e r s ] ) ( j o i n ! [ t h i s ] ) ( k i l l - t a s k ! [ t h i s t a s k - i d ] ) ( l a u n c h - t a s k s ! [ t h i s o f f e r - i d t a s k s ] [ t h i s o f f e r - i d t a s k s f i l t e r s ] ) ( r e c o n c i l e - t a s k s [ t h i s s t a t u s e s ] ) ( r e q u e s t - r e s o u r c e s [ t h i s r e q u e s t s ] ) ( r e v i v e - o f f e r s [ t h i s ] ) ( r u n - d r i v e r ! [ t h i s ] ) ( s e n d - f r a m e w o r k - m e s s a g e ! [ t h i s e x e c u t o r - i d s l a v e - i d d a t a ] ) ( s t a r t ! [ t h i s ] ) ( s t o p ! [ t h i s ] [ t h i s f a i l o v e r ? ] ) )
  31. SCHEDULER FACADE: INPUT r e s o u r c

    e - o f f e r s : Full available resource topology. o f f e r - r e s c i n d e d : Unavailable resource. f r a m e w o r k - m e s s a g e : Incoming message from executor. s t a t u s - u p d a t e : Task status update from executor.
  32. SCHEDULER FACADE: OUTPUT l a u n c h -

    t a s k s ! : Schedule tasks to be run on an executor. k i l l - t a s k ! : Schedule a task termination. r e c o n c i l e - t a s k s : Get back all task statuses. r e q u e s t - r e s o u r c e s : Ask for new resources.
  33. WHAT IT COMES DOWN TO You receive ; ; S

    c h e d u l e r ( r e s o u r c e - o f f e r s [ t h i s d r i v e r o f f e r s ] ) You send ; ; S c h e d u l e r D r i v e r ( l a u n c h - t a s k s ! [ t h i s o f f e r - i d t a s k s ] )
  34. ANATOMY OF A RESOURCE OFFER { : i d {

    : v a l u e " 2 0 1 5 0 5 0 6 - . . . " } , : h o s t n a m e " l o c a l h o s t " , : r e s o u r c e s [ { : n a m e " m e m " , : t y p e : v a l u e - s c a l a r , : s c a l a r 6 9 1 8 . 0 } { : n a m e " c p u s " , : t y p e : v a l u e - s c a l a r , : s c a l a r 8 . 0 } { : n a m e " d i s k " , : t y p e : v a l u e - s c a l a r , : s c a l a r 2 4 9 8 9 . 0 } { : n a m e " p o r t s " , : t y p e : v a l u e - r a n g e s , : r a n g e s [ { : b e g i n 3 1 0 0 0 , : e n d 3 2 0 0 0 } ] } ] , : a t t r i b u t e s [ ] }
  35. ANATOMY OF A TASK Namespaced commands { : n a

    m e " m y t a s k " : t a s k - i d " t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / w o r k / m y j o b . j a r " : e n v i r o n m e n t [ { : n a m e " D A T A B A S E _ H O S T " : v a l u e " d b h o s t " : v a l u e " D A T A B A S E _ P O R T " : v a l u e " 5 4 3 2 " } ] } }
  36. ANATOMY OF A TASK Docker containers { : n a

    m e " m y d o c k e r t a s k " : t a s k - i d " d o c k e r - t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l f a l s e } : c o n t a i n e r { : i m a g e " d o c k e r f i l e / n g i n x " : p o r t - m a p p i n g s [ { : c o n t a i n e r - p o r t 8 0 } ] : n e t w o r k : d o c k e r - n e t w o r k - b r i d g e } }
  37. CORE.ASYNC FACADES ( d e f m u l t

    i h a n d l e - m e s o s - m e s s a g e ( f n [ _ { : k e y s [ t y p e ] } ] t y p e ) ) ( l e t [ c h ( c h a n ) s c h e d ( a s y n c / s c h e d u l e r c h ) f r a m e w o r k { : n a m e " M y o w n f r a m e w o r k " } d r i v e r ( s c h e d / s c h e d u l e r - d r i v e r s c h e d f r a m e w o r k " l o c a l h o s t : 5 0 5 0 " ) ] ( s c h e d / s t a r t ! d r i v e r ) ( a / r e d u c e ! h a n d l e - m e s o s - m e s s a g e { : d r i v e r d r i v e r } c h ) )
  38. ALLOCATION HELPER Protocol based ( d e f p r

    o t o c o l I A l l o c a t o r ( a l l o c a t e [ t h i s o f f e r s t a s k s ] ) )
  39. ALLOCATOR IMPLEMENTATION Single one for now: n a i v

    e - a l l o c a t o r . Accounts for number of instances of a task and colocation factors. "I want 6 processes but no more than 2 on a single host"
  40. ADDITIONAL TASK FIELDS { : n a m e "

    m y t a s k " : t a s k - i d " t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / w o r k / m y j o b . j a r " } : m a x c o l 2 : c o u n t 6 }
  41. CONFIGURATION [ { : n a m e " a

    n a l y t i c s " : t a s k - i d " a n a l y t i c s " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / a n a l y t i c s . j a r " } : s c h e d u l e " * / 2 0 * * * * * " } { : n a m e " u s a g e m e t e r i n g " : t a s k - i d " m e t e r i n g " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / m e t e r i n g . j a r " } : s c h e d u l e " * / 3 0 * * * * * " } ]
  42. STATE { : d r i v e r .

    . . : r u n n i n g ? # { " a n a l y t i c s " " m e t e r i n g " } }
  43. TICKING c r o n j does a very good

    job at it. Send a message on every tick ( d e f n t i c k e r [ c h c o n f i g ] ( c r o n j : e n t r i e s ( f o r [ { : k e y s [ n a m e s c h e d u l e ] : a s p a y l o a d } c o n f i g ] { : i d n a m e : s c h e d u l e s c h e d u l e : h a n d l e r # ( p u t ! c h ( a s s o c p a y l o a d : t y p e : s t a r t - t a s k ) ) } ) ) )
  44. STRUCTURE ( d e f m u l t i

    h a n d l e - m e s s a g e ( f n [ _ m e s s a g e ] ( : t y p e m e s s a g e ) ) ) ( d e f n m e s o s - c r o n [ c o n f i g ] ( l e t [ c h ( c h a n ) s c h e d ( a s y n c / s c h e d u l e r c h ) f r a m e w o r k { : n a m e " m e s o s - c r o n " } d r i v e r ( s c h e d / s c h e d u l e r - d r i v e r s c h e d f r a m e w o r k " l o c a l h o s t : 5 0 5 0 " ) s t a t e { : d r i v e r d r i v e r : r e s o u r c e s r e s o u r c e s } ] ( t i c k e r c h c o n f i g ) ( s c h e d / s t a r t ! d r i v e r ) ( a / r e d u c e ! h a n d l e - m e s s a g e { : d r i v e r d r i v e r } c h ) ) )
  45. HANDLING RESOURCE OFFERS ( d e f m e t

    h o d h a n d l e - m e s s a g e : r e s o u r c e - o f f e r s [ s t a t e { : k e y s [ o f f e r s ] } ] ( a s s o c s t a t e : o f f e r s o f f e r s ) )
  46. HANDLING TICKS ( d e f m e t h

    o d h a n d l e - m e s s a g e : s t a r t - t a s k [ s t a t e t a s k ] ( l e t [ { : k e y s [ d r i v e r o f f e r s r u n n i n g ? ] } s t a t e ] ( c o n d ( r u n n i n g ? ( : t a s k - i d t a s k ) ) ( w a r n " j o b i s a l r e a d y r u n n i n g , s k i p p i n g t i c k " ) : e l s e ( d o s e q [ [ o f f e r - i d t a s k ] ( a l l o c a t e o f f e r s t a s k ) ] ( s c h e d / l a u n c h - t a s k s d r i v e r o f f e r - i d t a s k ) ) ) ) s t a t e )
  47. UPDATING STATUS INFO ( d e f m e t

    h o d h a n d l e - m e s s a g e : s t a t u s - u p d a t e [ s t a t e { : k e y s [ t a s k - i d s t a t u s ] } ] ( i f ( = s t a t u s : t a s k - r u n n i n g ) ( u p d a t e s t a t e : r u n n i n g ? c o n j t a s k - i d ) ( u p d a t e s t a t e : r u n n i n g ? d i s j t a s k - i d ) ) )
  48. HANDLING TOPOLOGY CHANGES Making use of clojure's properties: Observable atoms

    c l o j u r e . d a t a / d i f f c l o j u r e . c o r e . m a t c h / m a t c h
  49. ATOM-BACKED CONFIGURATION ( l e t [ c o n

    f i g ( a t o m { } ) ; ; 1 . H o l d c o n f i g i n a t o m c h ( c h a n ) ; ; 2 . C r e a t e o u r m e s s a g e c h a n m e s o s ( m a k e - f r a m e w o r k ! c o n f i g c h ) ; ; 3 . R e g i s t e r w / m e s o s t i c k e r ( t i c k / c r e a t e ! c h ) ] ; ; 4 . S t a r t s c h e d u l e r ( w a t c h / w a t c h - c o n f i g c o n f i g ) ; ; 5 . L i s t e n f o r c o n f i g c h a n g e s ( a d d - w a t c h d b : s y n c ( c o n v e r g e ! c h ) ) ) ; ; 6 . U p d a t e t o p o l o g y o n c h a n g e s
  50. CONVERGING TOPOLOGY ( d e f n c o n

    v e r g e ! [ c h ] ( f n [ _ _ o l d n e w ] ( l e t [ s i d e - e f f e c t s ( d e c i s i o n s o l d n e w ) ] ( d o s e q [ e f f e c t s i d e - e f f e c t s ] ( p e r f o r m - s i d e - e f f e c t c h s i d e - e f f e c t ) ) ) ) )
  51. MAKING CLEAR DECISIONS ( d e f n d e

    c i s i o n s [ o l d n e w ] ( l e t [ [ b e f o r e a f t e r _ ] ( d i f f o l d n e w ) c h a n g e d ( i n t e r s e c t i o n ( - > a f t e r k e y s s e t ) ( - > b e f o r e k e y s s e t ) ) ] ( m a p c a t ( c h a n g e d - m a p o l d n e w ) c h a n g e d ) ) )
  52. MAKING CLEAR DECISIONS ( d e f n c h

    a n g e d - m a p [ o l d n e w k ] ( l e t [ o l d ( g e t o l d k { : s t a t u s : s t o p } ) n e w ( g e t n e w k { : s t a t u s : s t o p } ) ] ( m a t c h [ ( : r u n t i m e o l d ) ( : r u n t i m e n e w ) ( : s t a t u s o l d ) ( : s t a t u s n e w ) ] ; ; B a s i c s t a t u s c h a n g e s [ _ _ : s t o p : s t o p ] [ ] [ _ _ : s t a r t : s t o p ] [ { : a c t i o n : s t o p : t a s k o l d } ] [ _ _ : s t o p : s t a r t ] [ { : a c t i o n : s t a r t : t a s k n e w } ] ; ; U n i t r u n t i m e c h a n g e s [ : d o c k e r : d o c k e r _ _ ] [ ] [ : c o m m a n d : c o m m a n d _ _ ] [ ] [ _ _ : c o m m a n d : d o c k e r ] [ { : a c t i o n : s t o p : t a s k o l d } { : a c t i o n : s t a r t : t a s k n e w } ] [ _ _ : d o c k e r : c o m m a n d ] [ { : a c t i o n : s t o p : t a s k o l d } { : a c t i o n : s t a r t : t a s k n e w } ] ) ) )
  53. CLOJURE HELPS c o r e . a s y

    n c channels help model the flow of incoming messages nicely. c o r e . m a t c h brings clarity to your decision making. Most of this is taken from bundes: h t t p s : / / g i t h u b . c o m / p y r / b u n d e s Immutability helps.
  54. DID MESOS HELP ? Using mesomatic, it's easy to ensure

    that your expected topology exists through-out the cluster. Containers ensure good enough isolation. Ops burden is greatly reduced.
  55. FUTURE WORK A bigger contributor list :-) Following mesos releases

    closely. Publishing example frameworks (onyx would be great!) Pure JVM mesos client? A more polished bundes.
  56. QUESTIONS ? Thanks! @ p y r If you are

    writing apps, check out: https://github.com/pyr/unilog