Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Mesos NYC Meetup

592f0962ae9f1eb53adfc8ed0893d9a4?s=47 benh
August 20, 2013

Apache Mesos NYC Meetup

592f0962ae9f1eb53adfc8ed0893d9a4?s=128

benh

August 20, 2013
Tweet

Transcript

  1. Benjamin  Hindman    –  @benh   Apache  Mesos   mesos.apache.org

      @ApacheMesos  
  2. download  and  install   http://www.apache.org/dyn/closer.cgi/mesos/0.12.1   $  tar  zxf  mesos-­‐0.12.1.tar.gz

      $  cd  mesos-­‐0.12.1   $  ./configure  -­‐-­‐prefix=/path/to/install/directory   $  make  install  
  3. releases   maintained   stable   development   0.12.1  

    0.13.0   (0.13.0-­‐rc7)   0.14.0   (0.14.0-­‐rc1)  
  4. development  release   $  git  clone  https://git.apache.org/mesos.git   $  cd

     mesos   $  ./bootstrap   $  ./configure  -­‐-­‐prefix=/path/to/install/directory   $  make  install  
  5. packages   https://s3.amazonaws.com/mesos-­‐pkg/ubuntu/12.10/mesos_0.14.0_amd64.deb   https://s3.amazonaws.com/mesos-­‐pkg/ubuntu/12.04/mesos_0.14.0_amd64.deb   https://s3.amazonaws.com/mesos-­‐pkg/debian/7.0/mesos_0.14.0_amd64.deb   Contact  info@mesosphe.re

     for  more  information  or  other  packages  
  6. packages   https://github.com/deric/mesos-­‐deb-­‐packaging   packaging  support  in  0.15.0   nightly/weekly

     snapshots  of  development  
  7. mesos  

  8. starting  a  master   $  mesos-­‐master  -­‐-­‐help   $  mesos-­‐master

     -­‐-­‐ip=a.b.c.d   $  MESOS_ip=a.b.c.d  mesos-­‐master  
  9. mesos  

  10. starting  a  (fault-­‐tolerant)  master   $  mesos-­‐master  -­‐-­‐zk=zk://ip1:port1,ip2:port2,…/mesos  

  11. mesos   Apache   ZooKeeper  

  12. mesos   Apache   ZooKeeper  

  13. starting  a  slave   $  mesos-­‐slave  –help   $  mesos-­‐slave

     -­‐-­‐master=ip:port   $  mesos-­‐slave  -­‐-­‐master=zk://ip1:port1,ip2:port2,…/mesos  
  14. mesos   Apache   ZooKeeper  

  15. mesos   Apache   ZooKeeper  

  16. mesos   Apache   ZooKeeper  

  17. now  what?  

  18. launch  frameworks  

  19. what’s  a  framework?  

  20. framework   ≈   distributed  system  

  21. frameworks   •  Hadoop  (github.com/mesos/hadoop)   •  Spark  (github.com/mesos/spark)  

    •  DPark  (github.com/douban/dpark)   •  Storm  (github.com/nathanmarz/storm)   •  Chronos  (github.com/airbnb/chronos)   •  MPICH2  (not  well  maintained,  email  mailing  list)  
  22. framework  commonality   run  processes  simultaneously  (distributed)   handle  process

     failures  (fault-­‐tolerance)   optimize  execution  (elasticity,  scheduling)  
  23. mesos   Apache   ZooKeeper   Apache   Hadoop  

    Chronos  
  24. mesos   Apache   ZooKeeper   Apache   Hadoop  

    Chronos  
  25. mesos   Apache   ZooKeeper   Apache   Hadoop  

    Chronos  
  26. mesos   Apache   ZooKeeper   Apache   Hadoop  

    Chronos  
  27. mesos   Apache   ZooKeeper   Apache   Hadoop  

    Chronos  
  28. but  why?  

  29. origins   Berkeley  research  project  including  Benjamin   Hindman,  Andy

     Konwinski,  Matei  Zaharia,  Ali   Ghodsi,  Anthony  D.  Joseph,  Randy  Katz,  Scott   Shenker,  Ion  Stoica   mesos.apache.org/documentation  
  30. static  partitioning   Apache   Hadoop   Chronos  

  31. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos  
  32. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to utilize machines (e.g., 72 GB RAM and 24 CPUs) (1)  
  33. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to utilize machines (e.g., 72 GB RAM and 24 CPUs) (1)  
  34. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to scale elastically (to take advantage of statistical multiplexing) (2)  
  35. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to scale elastically (to take advantage of statistical multiplexing) (2)  
  36. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to scale elastically (to take advantage of statistical multiplexing) (2)  
  37. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to scale elastically (to take advantage of statistical multiplexing) (2)  
  38. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to deal with failures (3)  
  39. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to deal with failures (3)  
  40. static  partitioning   considered  harmful     Apache   Hadoop

      Chronos   hard to deal with failures (3)  
  41. mesos  –  level  of  indirection   Apache   Hadoop  

    Chronos  
  42. mesos  –  level  of  indirection   Apache   Hadoop  

    Chronos  
  43. mesos  –  level  of  indirection   Apache   Hadoop  

    Chronos  
  44. a  “kernel”  for  the  datacenter   Apache   Hadoop  

    Chronos  
  45. primitives   scheduler  –  distributed  system  “master”   (executor  –

     lower-­‐level  control  of  task   execution,  optional)   requests/offers  –  resource  allocations   tasks  –  “threads”  of  the  distributed  system   state  –  working  set  of  the  distributed  system   …  
  46. scheduler   Apache   Hadoop   Chronos  

  47. scheduler   (1)  brokers  for  resources  (with  master)   (2)

     launches  tasks   (3)  handles  task  termination  
  48. brokering  for  resources   (1)  make  resource  requests    

                 2  CPUs                  1  GB  RAM                  slave  *   (2)  respond  to  resource  offers                  4  CPUs                  4  GB  RAM                  slave  foo.bar.com  
  49. offers:  non-­‐blocking  resource  allocation   exist  to  answer  the  question:

      “what  should  mesos  do  if  it  can’t  satisfy  a  request?”   (1)  wait  until  it  can   (2)  offer  the  best  allocation  it  can  immediately  
  50. offers:  non-­‐blocking  resource  allocation   exist  to  answer  the  question:

      “what  should  mesos  do  if  it  can’t  satisfy  a  request?”   (1)  wait  until  it  can   (2)  offer  the  best  allocation  it  can  immediately  
  51. “two-­‐level  scheduling”   mesos:  controls  resource  allocations  to   schedulers

      schedulers:  make  decisions  about  what  to  run   given  allocated  resources  
  52. end-­‐to-­‐end  principle   “application-­‐specific  functions  ought  to   reside  in

     the  end  hosts  of  a  network   rather  than  intermediary  nodes”  
  53. tasks   either  a  concrete  command  line  or  an  opaque

      description  (which  requires  a  framework   executor  to  execute)   a  consumer  of  resources  
  54. task  operations   launching/killing   health  monitoring/reporting  (failure  detection)  

    resource  usage  monitoring  (statistics)  
  55. state  (and  replicated  log)   …  when  your  distributed  system

     needs  state  (the   “working  set”,  often  10’s  to  100’s  of  MB),  what  do  you   do?   » a  database  is  overkill  (yet  another  system  to  manage)   » ZooKeeper  can  work  (but  you  probably  want  to  use  a   higher  level  abstraction,  and  if  you  have  more  than   1MB  you  could  be  out  of  luck,  and  …)   » can  build  your  own  distributed  state  machine  …  
  56. state  (and  replicated  log)   you  probably  don’t  want  Paxos

     ,  you  want  Multi-­‐Paxos   and  Multi-­‐Paxos  is  just  a  replicated  log  (i.e.,  a  replicated   log  is  an  implementation  of  Multi-­‐Paxos  but  with  a  nicer   interface)   in  Mesos  since  0.9.0  (including  Java/JNI  bindings),  used  in   production  for  ~2  years  
  57. state  (and  replicated  log)   even  a  replicated  log  is

     fairly  low-­‐level  (one  of  the   reasons  ZooKeeper  is  so  popular)  …  enter  “state”   State*  state  =  new  State(new  ReplicatedLogStorage());   Future<Variable<Registry>>  fetch  =  state-­‐>fetch(“registry”);   Variable<Registry>  variable  =  fetch.get();   Registry  registry  =  variable.get();   registry.makeSomeUpdates();   Variable<Registry>  variable_  =  variable.mutate(registry);   Future<Option<Variable<Registry>>>  store  =  state-­‐>store(variable_);  
  58. Mesos   Mesos   Node   Node   Node  

    Node   Hadoop   Node   Node   Node   Node   Spark   Node   Node   MPI   Node   …
  59. Mesos   Mesos   Node   Node   Node  

    Node   Hadoop   Node   Node   Node   Node   Spark   Node   Node   MPI   Storm   Node   …
  60. Mesos   Mesos   Node   Node   Node  

    Node   Hadoop   Node   Node   Node   Node   Spark   Node   Node   MPI   Storm   Node   Chronos   …
  61. Mesos   Mesos   Node   Node   Node  

    Node   Hadoop   Node   Node   Node   Node   Spark   Node   Node   MPI   Storm   Node   Chronos  
  62. Questions  …