Save 37% off PRO during our Black Friday Sale! »

Mesomatic: the cluster is a library

Mesomatic: the cluster is a library

2fcc875f98607b3007909fe4be99160d?s=128

Pierre-Yves Ritschard

June 24, 2015
Tweet

Transcript

  1. MESOMATIC THE CLUSTER IS A LIBRARY @PYR #EUROCLOJURE 0

  2. @PYR CTO at Exoscale, Swiss Cloud Hosting. Open source developer:

    pithos, cyanite, riemann, collectd, mesomatic.
  3. AIM OF THIS TALK Discouraging you from building distributed systems

    from scratch. Presenting a simple way to interact with Mesos. Yet another introduction to CAP
  4. OUTLINE The road to distributed systems Mesos Introduction Mesomatic Use-cases

  5. THE ROAD TO DISTRIBUTED SYSTEMS

  6. IT ALWAYS START WITH A SIMPLE PRODUCT You want to

    change the world by disrupting the job board industry. Standard three-tier, self-contained app. Does not fall into the usual definition of distributed systems.
  7. IT ALWAYS START WITH A SIMPLE PRODUCT

  8. FIRST SIGNS OF SUCCESS Your single server is not sufficient

    anymore. Database gets its own machines, adding new web servers fixes the issue. Logging becomes a bit harder. You switch to a centralized logging solution.
  9. FIRST SIGNS OF SUCCESS

  10. FEATURES GET ADDED Subscriptions emails. Doing it synchronously is impossible.

    Let's add a worker (and thus a queueing mechanism). You start switching from pets to cattle.
  11. FEATURES GET ADDED

  12. SEED MONEY RUNS OUT Let's try other monetization techniques. Freemium

    model with analytics. Where do I run these batch jobs? You partner with another company to exchange data. They have this weird legacy system and the only client lib is in PHP :-(
  13. SEED MONEY RUNS OUT

  14. AS THE PRODUCT GROWS, SO DOES INFRASTRUCTURE You're now at

    3 jenkins slave. You had to split metrics and monitoring on separate machines. You introduce a command and control solution to perform your regular operations. It's time to use puppet (You're really starting to feel like an ops person now).
  15. LET'S TAKE A STEP BACK You're ticking all the boxes:

    CI, Infrastructure as Code, DevOps.
  16. BUT RESOURCE UTILIZATION IS LOW Most of it is articifial

    (agents on every nodes). But you still have peak induced regular contention.
  17. ADDING NEW SERVICES OR COMPONENTS IS HARD Should your most

    active git repository really be the puppet one? You constantly have to make allocation decisions.
  18. HANDLING FAILURE IS HARD Your monitoring system tells you when

    something breaks. You have to recreate machines manually, update configuration all over the place.
  19. FROM A SERVICE POINT OF VIEW IT ALL MAKES SENSE

  20. WHAT WOULD BE NICE

  21. WHAT WOULD BE NICE

  22. ALTERNATIVES Containers Private PaaS Cluster allocators

  23. IT'S NOT YOUR JOB How do you get out of

    the business of shuffling configuration and apps around ?
  24. MESOS TO THE RESCUE

  25. WHAT IT SAYS ON THE BOX Apache Mesos abstracts CPU,

    memory, storage, and other compute resources away from machines (physical or virtual), enabling fault- tolerant and elastic distributed systems to easily be built and run effectively.
  26. THREE COMPONENT ARCHITECTURE

  27. HOW IT WORKS Masters coordinate. Slaves expose resources. Workloads called

    Frameworks sit on top.
  28. MESOS MASTERS Expose an HTTP & Protobuf API. Expose a

    web UI. Main entry-point for mesos Frameworks. Gather an maintain slave availability and capacity information. Highly available and horizontally scalable.
  29. MESOS SLAVES Used to launch tasks Tasks are isolated Expose

    their resources to masters. CPU RAM IPs Volumes Attributes Labels
  30. MESOS FRAMEWORKS Software which instruments workloads on a mesos cluster.

    Two components: Schedulers and Executors.
  31. MESOS SCHEDULERS Interact with the mesos master. Receive offers and

    task statuses. Launch tasks by picking up on provided offers. Responsible for exposing services if need be.
  32. MESOS EXECUTORS Responsible for managing workload on mesos slaves. Task

    status is reported back to schedulers. Optional, mesos slave may directly run namespaced commands or start docker containers.
  33. ARCHITECTURE

  34. OFFER BASED ALLOCATION MODEL Scheduler receives a list of offers.

    Decides which ones it wants to pick up and run tasks on. Subsequent updated list of offers will be delivered. Resource offers are decided by mesos' allocator. Puts a lot of responsibility on the scheduler.
  35. MESOS TASKS Runs in a container. Linux namespaces (cgroups) or

    docker. Expresses its resource needs.
  36. MESOS TASKS: ADDITIONAL PROPERTIES May provide a health-check. May ask

    for port forwards. May ask for persistence storage volumes.
  37. NOTABLE FRAMEWORKS Marathon Apache Aurora Chronos Jenkins Hadoop Apache Kafka

    Apache Cassandra
  38. MESOMATIC: THE CLUSTER IS A LIBRARY

  39. MESOMATIC [ [ s p o o t n i

    k / m e s o m a t i c " 0 . 2 2 . 1 - r 0 " ] [ s p o o t n i k / m e s o m a t i c - a s y n c " 0 . 2 2 . 1 - r 0 " ] ]
  40. WHY A LIBRARY Existing frameworks might not fit your constraints.

    Distributing workload from within your application. Many frameworks will share a common set of needs. Great foundation for aaS type products.
  41. COMPONENTS Clojure types for mesos datastructures. Facade for creating executors.

    Facade for creating schedulers. Core.async versions of the facades. Purely functional allocation helper.
  42. CLOJURE TYPES m e s o m a t i

    c . t y p e s Conversion functions to and from clojure data. p b - > d a t a , - > p b , d a t a - > p b .
  43. CLOJURE TYPES ( d a t a - > p

    b ( - > T a s k I D " m y - t a s k " ) ) ; ; = > # o b j e c t [ o r g . a p a c h e . m e s o s . P r o t o s $ T a s k I D . . . ] ( - > p b : T a s k I D { : v a l u e " m y - t a s k " } ) ; ; ; ; = > # o b j e c t [ o r g . a p a c h e . m e s o s . P r o t o s $ T a s k I D . . . ] ( p b - > d a t a ( - > ( P r o t o s $ T a s k I D / n e w B u i l d e r ) ( . s e t V a l u e " m y - t a s k " ) ( . b u i l d ) ) ) ; ; = > # m e s o m a t i c . t y p e s . T a s k I D { : v a l u e " m y - t a s k " }
  44. EXECUTOR AND SCHEDULER FACADES Protocol based Client code reifies a

    protocol to input messages from mesos. Perform side-effect (output) through a driver (S c h e d u l e r D r i v e r or E x e c u t o r D r i v e r ).
  45. EXECUTOR FACADE ( d e f p r o t

    o c o l E x e c u t o r ( d i s c o n n e c t e d [ t h i s d r i v e r ] ) ( e r r o r [ t h i s d r i v e r m e s s a g e ] ) ( f r a m e w o r k - m e s s a g e [ t h i s d r i v e r d a t a ] ) ( k i l l - t a s k [ t h i s d r i v e r t a s k - i d ] ) ( l a u n c h - t a s k [ t h i s d r i v e r t a s k ] ) ( r e g i s t e r e d [ t h i s d r i v e r e x e c u t o r - i n f o f r a m e w o r k - i n f o s l a v e - i n f o ] ) ( r e r e g i s t e r e d [ t h i s d r i v e r s l a v e - i n f o ] ) ( s h u t d o w n [ t h i s d r i v e r ] ) ) ( d e f p r o t o c o l E x e c u t o r D r i v e r ( a b o r t ! [ t h i s ] ) ( j o i n ! [ t h i s ] ) ( r u n - d r i v e r ! [ t h i s ] ) ( s e n d - f r a m e w o r k - m e s s a g e ! [ t h i s d a t a ] ) ( s e n d - s t a t u s - u p d a t e ! [ t h i s s t a t u s ] ) ( s t a r t ! [ t h i s ] ) ( s t o p ! [ t h i s ] ) )
  46. EXECUTOR FACADE: INPUT l a u n c h -

    t a s k : Executor was instructed to launch a task. f r a m e w o r k - m e s s a g e : Executor received a framework message from a scheduler.
  47. EXECUTOR FACADE: OUTPUT s e n d - f r

    a m e w o r k - m e s s a g e ! : Sends a message back to the scheduler. s e n d - s t a t u s - u p d a t e ! : Report task status to the scheduler.
  48. SCHEDULER FACADE ( d e f p r o t

    o c o l S c h e d u l e r ( r e g i s t e r e d [ t h i s d r i v e r f r a m e w o r k - i d m a s t e r - i n f o ] ) ( r e r e g i s t e r e d [ t h i s d r i v e r m a s t e r - i n f o ] ) ( d i s c o n n e c t e d [ t h i s d r i v e r ] ) ( r e s o u r c e - o f f e r s [ t h i s d r i v e r o f f e r s ] ) ( o f f e r - r e s c i n d e d [ t h i s d r i v e r o f f e r - i d ] ) ( s t a t u s - u p d a t e [ t h i s d r i v e r s t a t u s ] ) ( f r a m e w o r k - m e s s a g e [ t h i s d r i v e r e x e c u t o r - i d s l a v e - i d d a t a ] ) ( s l a v e - l o s t [ t h i s d r i v e r s l a v e - i d ] ) ( e x e c u t o r - l o s t [ t h i s d r i v e r e x e c u t o r - i d s l a v e - i d s t a t u s ] ) ( e r r o r [ t h i s d r i v e r m e s s a g e ] ) )
  49. SCHEDULER DRIVER ( d e f p r o t

    o c o l S c h e d u l e r D r i v e r ( a b o r t ! [ t h i s ] ) ( d e c l i n e - o f f e r [ t h i s o f f e r - i d ] [ t h i s o f f e r - i d f i l t e r s ] ) ( j o i n ! [ t h i s ] ) ( k i l l - t a s k ! [ t h i s t a s k - i d ] ) ( l a u n c h - t a s k s ! [ t h i s o f f e r - i d t a s k s ] [ t h i s o f f e r - i d t a s k s f i l t e r s ] ) ( r e c o n c i l e - t a s k s [ t h i s s t a t u s e s ] ) ( r e q u e s t - r e s o u r c e s [ t h i s r e q u e s t s ] ) ( r e v i v e - o f f e r s [ t h i s ] ) ( r u n - d r i v e r ! [ t h i s ] ) ( s e n d - f r a m e w o r k - m e s s a g e ! [ t h i s e x e c u t o r - i d s l a v e - i d d a t a ] ) ( s t a r t ! [ t h i s ] ) ( s t o p ! [ t h i s ] [ t h i s f a i l o v e r ? ] ) )
  50. SCHEDULER FACADE: INPUT r e s o u r c

    e - o f f e r s : Full available resource topology. o f f e r - r e s c i n d e d : Unavailable resource. f r a m e w o r k - m e s s a g e : Incoming message from executor. s t a t u s - u p d a t e : Task status update from executor.
  51. SCHEDULER FACADE: OUTPUT l a u n c h -

    t a s k s ! : Schedule tasks to be run on an executor. k i l l - t a s k ! : Schedule a task termination. r e c o n c i l e - t a s k s : Get back all task statuses. r e q u e s t - r e s o u r c e s : Ask for new resources.
  52. WHAT IT COMES DOWN TO You receive ; ; S

    c h e d u l e r ( r e s o u r c e - o f f e r s [ t h i s d r i v e r o f f e r s ] ) You send ; ; S c h e d u l e r D r i v e r ( l a u n c h - t a s k s ! [ t h i s o f f e r - i d t a s k s ] )
  53. ANATOMY OF A RESOURCE OFFER { : i d {

    : v a l u e " 2 0 1 5 0 5 0 6 - . . . " } , : h o s t n a m e " l o c a l h o s t " , : r e s o u r c e s [ { : n a m e " m e m " , : t y p e : v a l u e - s c a l a r , : s c a l a r 6 9 1 8 . 0 } { : n a m e " c p u s " , : t y p e : v a l u e - s c a l a r , : s c a l a r 8 . 0 } { : n a m e " d i s k " , : t y p e : v a l u e - s c a l a r , : s c a l a r 2 4 9 8 9 . 0 } { : n a m e " p o r t s " , : t y p e : v a l u e - r a n g e s , : r a n g e s [ { : b e g i n 3 1 0 0 0 , : e n d 3 2 0 0 0 } ] } ] , : a t t r i b u t e s [ ] }
  54. ANATOMY OF A TASK Namespaced commands { : n a

    m e " m y t a s k " : t a s k - i d " t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / w o r k / m y j o b . j a r " : e n v i r o n m e n t [ { : n a m e " D A T A B A S E _ H O S T " : v a l u e " d b h o s t " : v a l u e " D A T A B A S E _ P O R T " : v a l u e " 5 4 3 2 " } ] } }
  55. ANATOMY OF A TASK Docker containers { : n a

    m e " m y d o c k e r t a s k " : t a s k - i d " d o c k e r - t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l f a l s e } : c o n t a i n e r { : i m a g e " d o c k e r f i l e / n g i n x " : p o r t - m a p p i n g s [ { : c o n t a i n e r - p o r t 8 0 } ] : n e t w o r k : d o c k e r - n e t w o r k - b r i d g e } }
  56. CORE.ASYNC FACADES Opt-in dependency. Produces all incoming mesos messages on

    a channel.
  57. CORE.ASYNC FACADES ( d e f m u l t

    i h a n d l e - m e s o s - m e s s a g e ( f n [ _ { : k e y s [ t y p e ] } ] t y p e ) ) ( l e t [ c h ( c h a n ) s c h e d ( a s y n c / s c h e d u l e r c h ) f r a m e w o r k { : n a m e " M y o w n f r a m e w o r k " } d r i v e r ( s c h e d / s c h e d u l e r - d r i v e r s c h e d f r a m e w o r k " l o c a l h o s t : 5 0 5 0 " ) ] ( s c h e d / s t a r t ! d r i v e r ) ( a / r e d u c e ! h a n d l e - m e s o s - m e s s a g e { : d r i v e r d r i v e r } c h ) )
  58. ALLOCATION HELPER Protocol based ( d e f p r

    o t o c o l I A l l o c a t o r ( a l l o c a t e [ t h i s o f f e r s t a s k s ] ) )
  59. ALLOCATOR IMPLEMENTATION Single one for now: n a i v

    e - a l l o c a t o r . Accounts for number of instances of a task and colocation factors. "I want 6 processes but no more than 2 on a single host"
  60. ADDITIONAL TASK FIELDS { : n a m e "

    m y t a s k " : t a s k - i d " t a s k - 0 " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / w o r k / m y j o b . j a r " } : m a x c o l 2 : c o u n t 6 }
  61. EXAMPLE FRAMEWORK: CRON

  62. CONFIGURATION [ { : n a m e " a

    n a l y t i c s " : t a s k - i d " a n a l y t i c s " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / a n a l y t i c s . j a r " } : s c h e d u l e " * / 2 0 * * * * * " } { : n a m e " u s a g e m e t e r i n g " : t a s k - i d " m e t e r i n g " : r e s o u r c e s [ { : n a m e " m e m " : t y p e : v a l u e - s c a l a r : s c a l a r 5 1 2 . 0 } { : n a m e " c p u s " : t y p e : v a l u e - s c a l a r : s c a l a r 0 . 5 } ] : c o m m a n d { : s h e l l t r u e : v a l u e " j a v a - j a r / o p t / m e t e r i n g . j a r " } : s c h e d u l e " * / 3 0 * * * * * " } ]
  63. STATE { : d r i v e r .

    . . : r u n n i n g ? # { " a n a l y t i c s " " m e t e r i n g " } }
  64. TICKING c r o n j does a very good

    job at it. Send a message on every tick ( d e f n t i c k e r [ c h c o n f i g ] ( c r o n j : e n t r i e s ( f o r [ { : k e y s [ n a m e s c h e d u l e ] : a s p a y l o a d } c o n f i g ] { : i d n a m e : s c h e d u l e s c h e d u l e : h a n d l e r # ( p u t ! c h ( a s s o c p a y l o a d : t y p e : s t a r t - t a s k ) ) } ) ) )
  65. STRUCTURE ( d e f m u l t i

    h a n d l e - m e s s a g e ( f n [ _ m e s s a g e ] ( : t y p e m e s s a g e ) ) ) ( d e f n m e s o s - c r o n [ c o n f i g ] ( l e t [ c h ( c h a n ) s c h e d ( a s y n c / s c h e d u l e r c h ) f r a m e w o r k { : n a m e " m e s o s - c r o n " } d r i v e r ( s c h e d / s c h e d u l e r - d r i v e r s c h e d f r a m e w o r k " l o c a l h o s t : 5 0 5 0 " ) s t a t e { : d r i v e r d r i v e r : r e s o u r c e s r e s o u r c e s } ] ( t i c k e r c h c o n f i g ) ( s c h e d / s t a r t ! d r i v e r ) ( a / r e d u c e ! h a n d l e - m e s s a g e { : d r i v e r d r i v e r } c h ) ) )
  66. STRUCTURE

  67. HANDLING RESOURCE OFFERS ( d e f m e t

    h o d h a n d l e - m e s s a g e : r e s o u r c e - o f f e r s [ s t a t e { : k e y s [ o f f e r s ] } ] ( a s s o c s t a t e : o f f e r s o f f e r s ) )
  68. HANDLING TICKS ( d e f m e t h

    o d h a n d l e - m e s s a g e : s t a r t - t a s k [ s t a t e t a s k ] ( l e t [ { : k e y s [ d r i v e r o f f e r s r u n n i n g ? ] } s t a t e ] ( c o n d ( r u n n i n g ? ( : t a s k - i d t a s k ) ) ( w a r n " j o b i s a l r e a d y r u n n i n g , s k i p p i n g t i c k " ) : e l s e ( d o s e q [ [ o f f e r - i d t a s k ] ( a l l o c a t e o f f e r s t a s k ) ] ( s c h e d / l a u n c h - t a s k s d r i v e r o f f e r - i d t a s k ) ) ) ) s t a t e )
  69. UPDATING STATUS INFO ( d e f m e t

    h o d h a n d l e - m e s s a g e : s t a t u s - u p d a t e [ s t a t e { : k e y s [ t a s k - i d s t a t u s ] } ] ( i f ( = s t a t u s : t a s k - r u n n i n g ) ( u p d a t e s t a t e : r u n n i n g ? c o n j t a s k - i d ) ( u p d a t e s t a t e : r u n n i n g ? d i s j t a s k - i d ) ) )
  70. GOING FURTHER

  71. WHY CLOJURE IN THE FIRST PLACE Library-first approach Plenty of

    facilities to make better decisions
  72. HANDLING TOPOLOGY CHANGES Making use of clojure's properties: Observable atoms

    c l o j u r e . d a t a / d i f f c l o j u r e . c o r e . m a t c h / m a t c h
  73. ATOM-BACKED CONFIGURATION ( l e t [ c o n

    f i g ( a t o m { } ) ; ; 1 . H o l d c o n f i g i n a t o m c h ( c h a n ) ; ; 2 . C r e a t e o u r m e s s a g e c h a n m e s o s ( m a k e - f r a m e w o r k ! c o n f i g c h ) ; ; 3 . R e g i s t e r w / m e s o s t i c k e r ( t i c k / c r e a t e ! c h ) ] ; ; 4 . S t a r t s c h e d u l e r ( w a t c h / w a t c h - c o n f i g c o n f i g ) ; ; 5 . L i s t e n f o r c o n f i g c h a n g e s ( a d d - w a t c h d b : s y n c ( c o n v e r g e ! c h ) ) ) ; ; 6 . U p d a t e t o p o l o g y o n c h a n g e s
  74. CONVERGING TOPOLOGY ( d e f n c o n

    v e r g e ! [ c h ] ( f n [ _ _ o l d n e w ] ( l e t [ s i d e - e f f e c t s ( d e c i s i o n s o l d n e w ) ] ( d o s e q [ e f f e c t s i d e - e f f e c t s ] ( p e r f o r m - s i d e - e f f e c t c h s i d e - e f f e c t ) ) ) ) )
  75. MAKING CLEAR DECISIONS ( d e f n d e

    c i s i o n s [ o l d n e w ] ( l e t [ [ b e f o r e a f t e r _ ] ( d i f f o l d n e w ) c h a n g e d ( i n t e r s e c t i o n ( - > a f t e r k e y s s e t ) ( - > b e f o r e k e y s s e t ) ) ] ( m a p c a t ( c h a n g e d - m a p o l d n e w ) c h a n g e d ) ) )
  76. MAKING CLEAR DECISIONS ( d e f n c h

    a n g e d - m a p [ o l d n e w k ] ( l e t [ o l d ( g e t o l d k { : s t a t u s : s t o p } ) n e w ( g e t n e w k { : s t a t u s : s t o p } ) ] ( m a t c h [ ( : r u n t i m e o l d ) ( : r u n t i m e n e w ) ( : s t a t u s o l d ) ( : s t a t u s n e w ) ] ; ; B a s i c s t a t u s c h a n g e s [ _ _ : s t o p : s t o p ] [ ] [ _ _ : s t a r t : s t o p ] [ { : a c t i o n : s t o p : t a s k o l d } ] [ _ _ : s t o p : s t a r t ] [ { : a c t i o n : s t a r t : t a s k n e w } ] ; ; U n i t r u n t i m e c h a n g e s [ : d o c k e r : d o c k e r _ _ ] [ ] [ : c o m m a n d : c o m m a n d _ _ ] [ ] [ _ _ : c o m m a n d : d o c k e r ] [ { : a c t i o n : s t o p : t a s k o l d } { : a c t i o n : s t a r t : t a s k n e w } ] [ _ _ : d o c k e r : c o m m a n d ] [ { : a c t i o n : s t o p : t a s k o l d } { : a c t i o n : s t a r t : t a s k n e w } ] ) ) )
  77. STRUCTURE

  78. WRAPPING THINGS UP

  79. CLOJURE HELPS c o r e . a s y

    n c channels help model the flow of incoming messages nicely. c o r e . m a t c h brings clarity to your decision making. Most of this is taken from bundes: h t t p s : / / g i t h u b . c o m / p y r / b u n d e s Immutability helps.
  80. DID MESOS HELP ? Using mesomatic, it's easy to ensure

    that your expected topology exists through-out the cluster. Containers ensure good enough isolation. Ops burden is greatly reduced.
  81. FUTURE WORK A bigger contributor list :-) Following mesos releases

    closely. Publishing example frameworks (onyx would be great!) Pure JVM mesos client? A more polished bundes.
  82. QUESTIONS ? Thanks! @ p y r If you are

    writing apps, check out: https://github.com/pyr/unilog