Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Mesos (Twitter Intern Open House)

benh
August 01, 2012

Apache Mesos (Twitter Intern Open House)

Tech talk on Apache Mesos for the Twitter Intern Open House.

benh

August 01, 2012
Tweet

More Decks by benh

Other Decks in Technology

Transcript

  1. Benjamin  Hindman    –  @benh  
    Jie  Yu  –  @jie_yu  
    Apache  Mesos  
    incubator.apache.org/mesos  
    @ApacheMesos  

    View Slide

  2. history  
    Berkeley  research  project  including  Benjamin  
    Hindman,  Andy  Konwinski,  Matei  Zaharia,  Ali  
    Ghodsi,  Anthony  D.  Joseph,  Randy  Katz,  Scott  
    Shenker,  Ion  Stoica  
    incubator.apache.org/mesos/research.html  

    View Slide

  3. motivation:  static  partitioning  
    Node   Node  
    analytics  
    Node   Node  
    service  
    Node   Node  
    service  
    …  
    Node  

    View Slide

  4. frameworks  
    services  

    View Slide

  5. frameworks  
    services  

    View Slide

  6. static  partitioning  
    considered  harmful  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  

    View Slide

  7. static  partitioning  considered  harmful  
    hard  to  fully  utilize  machines  (e.g.,  72  GB  RAM  
    and  24  CPUs)  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  

    View Slide

  8. static  partitioning  considered  harmful  
    harder  to  deal  with  failures  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  
    X  

    View Slide

  9. harder  to  scale  elastically  
    static  partitioning  considered  harmful  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  
    Node   Node   Node  

    View Slide

  10. Mesos  
    Mesos  
    Node   Node   Node   Node  
    Hadoop   service  
    …  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  

    View Slide

  11. level  of  indirection  
    Mesos  
    Node   Node   Node   Node  
    Hadoop   service  
    …  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  

    View Slide

  12. Mesos:  
    1)  efficiently  share  datacenter  resources  

    View Slide

  13. better  utilization  
    Mesos  
    Node   Node   Node   Node  
    Hadoop   service  

    View Slide

  14. better  utilization  
    Node  

    View Slide

  15. better  utilization  
    Node  
    Hadoop   service  
    Hadoop   Hadoop  

    View Slide

  16. better  utilization  
    Node  
    Hadoop   service  
    Hadoop   Hadoop  
    need per machine isolation!

    View Slide

  17. easier  to  deal  with  failures  
    Mesos  
    Node   Node   Node   Node  
    Hadoop   service  
    X  

    View Slide

  18. enables  elasticity  
    Mesos  
    Node   Node   Node   Node  
    Hado
    op  
    service  

    View Slide

  19. Mesos:  
    1)  efficiently  share  datacenter  resources  
    2)  make  it  easier  to  build  distributed  services  
    and  analytics  frameworks    

    View Slide

  20. a  “kernel”  for  the  datacenter  
    Mesos  
    Node   Node   Node   Node  
    Hadoop   service  
    …  
    Node   Node  
    Hadoop  
    Node   Node  
    service  
    …  

    View Slide

  21. architecture  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  

    View Slide

  22. Mesos  
    master  
    Mesos  
    master  
    architecture  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  

    View Slide

  23. services  and  frameworks  
    1.  scheduler  

    View Slide

  24. architecture  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  
    service  Y  
    scheduler  
    requests  resources,  
    assign  tasks  

    View Slide

  25. services  and  frameworks  
    1.  scheduler  
    2.  executor  (optional,  if  you  don’t  just  want  to  
    run  a  single  command)  

    View Slide

  26. architecture  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  
    service  Y  
    scheduler  
    service  Y  
    task  
    (Netty  
    server)  
    service  Y  
    executor  
    Netty  
    Server  
    runs  tasks,  reports  
    status  updates  

    View Slide

  27. architecture  
    service  X  
    scheduler  
    allocation  
    module  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  
    decides  how  to  allocate  
    resources  
    service  Y  
    scheduler  
    service  Y  
    task  
    (Netty  
    server)  
    service  Y  
    executor  
    Netty  
    Server  

    View Slide

  28. “two-­‐level  scheduling”  
    Mesos:  controls  resource  allocations  to  
    applications/frameworks  
    applications/frameworks:  make  decisions  about  
    what  to  run  

    View Slide

  29. dominant  resource-­‐fairness  
    default  allocation  policy  (see  
    incubator.apache.org/mesos/research.html  for  
    more  info)    
    help  us  write  new  allocators!  

    View Slide

  30. architecture  
    service  X  
    scheduler  
    allocation  
    module  
    Mesos  
    master  
    Mesos  slave  
    service  X  
    executor  
    Mesos  slave  
    task  
    launches,  isolates,  
    and  monitors  tasks  
    and  executors  
    service  Y  
    scheduler  
    service  Y  
    task  
    (Netty  
    server)  
    service  Y  
    executor  
    Netty  
    Server  
    request  
    offer  

    View Slide

  31. “kernel”  primitives  for  building  
    frameworks  
    messaging  (unreliable)  
    mechanisms  for  high-­‐availability  
    fault-­‐detection  
    resource  isolation  (cgroups)  
    resource  monitoring  

    View Slide

  32. resource  isolation  in  Mesos    
    summer  intern  project    (May  –  August)  
    •  why  important?  
    •  how  to  achieve  it?  
    •  current  status  

    View Slide

  33. hadoop  
    scheduler  
    allocation  
    module  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  
    service  
    scheduler  
    service  
    task  
    (Netty  
    server)  
    service  
    executor  
    Netty  
    Server  
    hadoop  
    executor  
    Analytic  
    Task  
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    12G  Memory  
    Memory leak!
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    12G  Memory  
    w/o  isolation  
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    2G  Memory  
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    22G  Memory  
    Total:  
    8  CPUs  
    24G  Memory  

    View Slide

  34. hadoop  
    scheduler  
    allocation  
    module  
    Mesos  
    master  
    Mesos  slave  
    Mesos  slave  
    service  
    scheduler  
    service  
    task  
    (Netty  
    server)  
    service  
    executor  
    Netty  
    Server  
    hadoop  
    executor  
    Analytic  
    Task  
    Total:  
    8  CPUs  
    24G  Memory  
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    12G  Memory  
    Memory leak!
    Offered:  
    4  CPUs  
    12G  Memory  
    Actual:  
    4  CPUs  
    12G  Memory  
    w/  isolation  

    View Slide

  35. how  to  achieve  it?    
    virtual  machines  
    pros:
    ✔  strong isolation
    ✔  security
    cons:
    ✘  performance
    ✘  deployment
    ✘  debugging

    View Slide

  36. how  to  achieve  it?    
    OS  containers  
    pros:
    ✔  performance
    ✔  deployment
    cons:
    ✘  weak isolation
    ✘  security
    what we use in Mesos
    Linux control groups

    View Slide

  37. Linux  control  groups  (cgroups)  
    isolation  for  CPU,  memory,  disk  I/O,  network  I/O  
    supported  by  existing  Linux  kernel  
    low  performance  cost  
    easy  resource  usage  monitoring  
    event  notification  mechanism  
    support  pause  /  resume  
    simple  interface  to  control  

    View Slide

  38. current  status  
      support  isolation  for  CPUs  and  memory  
     -­‐-­‐  easily  extensible  to  support  disk  I/O  
      support  out-­‐of-­‐memory  event  notification  
     -­‐-­‐  admin  can  define  policies  (e.g.  kill,  pause)  
      support  pausing  and  resuming  executors  
      support  monitoring  actual  resource  usage  
     -­‐-­‐  including  a  new  front-­‐end  UI  
      ready  to  be  checked  in!  

    View Slide

  39. monitoring  realtime  resource  
    usage  for  each  executor  

    View Slide

  40. Mesos  
    Mesos  
    Node   Node   Node   Node  
    Hadoop  
    …  
    Node   Node   Node   Node  
    Spark  

    View Slide

  41. Mesos  at  Twitter  
    Mesos  
    Node   Node   Node   Node  
    Hadoop  
    …  
    Node   Node   Node   Node  
    Spark   Storm  

    View Slide

  42. demo  

    View Slide

  43. analytics  
    •  Hadoop  (0.20.205  and  0.20.2-­‐cdh3u3)  
    •  MPICH2  (Open  Source  MPI  framework)  
    •  Spark  (github.com/mesos/spark)  
    •  DPark  (github.com/douban/dpark)  
    •  Storm  (github.com/nathanmarz/storm)  

    View Slide

  44. details  
    built  in  C++,  APIs  in  C++,  Java,  Python  
    uses  libprocess  for  asynchronous  actor  style  
    concurrency  (github.com/libprocess)  

    View Slide

  45. genomics  researchers  using  Hadoop  
    and  Spark  
    Building  a  new  framework  for  job  
    workflows,  wants  to  use  Spark  and  
    Hadoop  too  
    Built  DPark  (a  Python  clone  of  
    Spark),  also  running  MPI  
    Hadoop  and  Spark  used  by  machine  
    learning  researchers  

    View Slide

  46. try  it  out!  
    run  on  bare-­‐metal  or  virtual  machines  –  develop  
    against  Mesos  API  and  run  in  private  datacenter,  
    or  the  cloud,  or  both!  

    View Slide

  47. questions?  
    incubator.apache.org/mesos  
    @ApacheMesos

    View Slide

  48. Twitter                      Open  Source  
    twitter.github.com  
    @TwitterOSS  

    View Slide