$30 off During Our Annual Pro Sale. View Details »

Running YARN alongside Mesos - MesosCon 2014

mohit
August 21, 2014

Running YARN alongside Mesos - MesosCon 2014

Almost anyone considering using Mesos in their datacenter, will come across a very interesting challenge, which is of running Mesos alongside YARN. Mesos and YARN are both resource managers, each with a unique architecture, protocol and ecosystem around them. Today, running these two alongside statically partitions the datacenter, something that both managers are designed to avoid, when they are the only resource manager in control. The slidedeck presents a novel solution that will allow Mesos and YARN to co-exist and share resources, without partitioning the datacenter.

mohit

August 21, 2014
Tweet

More Decks by mohit

Other Decks in Programming

Transcript

  1. M e s o s
    Running YARN alongside
    Mohit Soni
    ebay inc
    Renan DelValle
    SUNY Binghamton

    View Slide

  2. •  Cluster  manager  
    •  Two  Level  Scheduler  
    •  Supports  service  and  analy9cal  jobs  
    About Mesos
    Mesos  
    Node   Node   Node   Node  
    Marathon   MPI   Spark   Aurora  

    View Slide

  3. •  Resource  Nego9ator  
    •  Single  level  scheduler  
    •  Supports  different  types  of  analy9cal  jobs  
    About YARN

    View Slide

  4. •  Independent  resource  managers  sta9cally  par99ons  datacenter  
    •  Mesos  supports  long  running  services  and  analy9cs  workloads  
    •  YARN  ecosystem  is  currently  around  analy9cs/data  processing  
     
    Goal  
    Share  resources  between  YARN  and  Mesos,  with  Mesos  being  the  resource  manager  
    for  the  data  center.  
     
    Solu'on  Characteris'cs  
    •  Non-­‐intrusive,  avoids  modifying  Mesos  or  YARN  protocols  
    •  Easy  future  upgrades  
    •  Easier  cer9fica9on  path  
    •  Use  YARN’s  scheduling  data,  for  providing  &  rescinding  resources  to  YARN  
    Problem Statement

    View Slide

  5. YARN  Architecture  Overview  
    Resource  
    Manager  
    YARN  
    Node  
    Node  
    Manager  
    YARN  
    Node  
    Node  
    Manager  
    YARN  
    AppMaster   Container   Container  
    Container  

    View Slide

  6. How it works
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Node  
    Mesos  
    Slave  
    Mesos  
    Control  Plane  

    View Slide

  7. How it works
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Node  
    Mesos  
    Slave  
    Mesos  
    Node  
    Manager  
    YARN  
    Control  Plane  
    2.5  CPU  
    2.5  GB  
    Launch    
    NodeManager  

    View Slide

  8. How it works
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Node  
    Mesos  
    Slave  
    Mesos  
    Node  
    Manager  
    YARN  
    Control  Plane  
    2.5  CPU  
    2.5  GB  
    2.0  CPU  
    2.0  GB  

    View Slide

  9. How it works
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Node  
    C1  
    C2  
    Mesos  
    Slave  
    Mesos  
    Node  
    Manager  
    YARN  
    Control  Plane  
    Launch  containers  

    View Slide

  10. How it works
    cgroups  hierarchy  
    /sys/fs/cgroup/cpu/mesos  
    /  
    /C1  
    /C2  
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Node  
    C1  
    C2  
    Mesos  
    Slave  
    Mesos  
    Node  
    Manager  
    YARN  
    Control  Plane  
    /hadoop-­‐yarn  

    View Slide

  11. Scenario 1: Handling Traffic Spikes
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Hadoop  cluster  Node  1  
    Control  Plane  
    Hadoop  cluster  Node  2  
    8  CPU  
    8  GB  
    8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB  
    Traffic  spike  for  
    ebay.com/sch  
    Node  
    Manager  
    YARN  
    Node  
    Manager  
    YARN  
    8  CPU  
    8  GB  
    Slave  
    Mesos  
    Slave  
    Mesos  

    View Slide

  12. Scenario 1: Handling Traffic Spikes
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Hadoop  cluster  Node  1  
    Control  Plane  
    Hadoop  cluster  Node  2  
    8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB  
    Traffic  spike  for  
    ebay.com/sch  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  
    Slave  
    Mesos  
    Slave  
    Mesos  
    Resize  NodeManager  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  

    View Slide

  13. Scenario 1: Handling Traffic Spikes
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Hadoop  cluster  Node  1  
    Control  Plane  
    Hadoop  cluster  Node  2  
    8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB  
    Service  
    sch  
    2  CPU  
    2  GB  
    Traffic  spike  for  
    ebay.com/sch  
    Service  
    sch  
    2  CPU  
    2  GB  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  
    Slave  
    Mesos  
    Slave  
    Mesos  
    Deploy  sch  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  

    View Slide

  14. Scenario 1: Handling Traffic Spikes
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Hadoop  cluster  Node  1  
    Control  Plane  
    Hadoop  cluster  Node  2  
    8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB  
    Service  
    sch  
    2  CPU  
    2  GB  
    ebay.com/sch  
    survives  
    Service  
    sch  
    2  CPU  
    2  GB  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  
    Slave  
    Mesos  
    Slave  
    Mesos  
    6  CPU  
    6  GB  
    Node  
    Manager  
    YARN  

    View Slide

  15. Scenario 1: Handling Traffic Spikes
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    Master  
    Mesos  
    Hadoop  cluster  Node  1  
    Control  Plane  
    Hadoop  cluster  Node  2  
    8  CPU  
    8  GB  
    8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB  
    Node  
    Manager  
    YARN  
    Node  
    Manager  
    YARN  
    8  CPU  
    8  GB  
    Slave  
    Mesos  
    Slave  
    Mesos  
    Restore  cluster  

    View Slide

  16. YARN Pending Improvements
    •  Restar9ng  NodeManager,  kills  child  containers  
    (YARN-­‐1336)  
    •  Restar9ng  AppMaster,  kills  child  containers  across  
    all  nodes  (YARN-­‐1489)  
    •  NodeManager’s  pending  support  for  cgroups  
    memory  subsystem  
    •  Geeng  richer  scheduling  informa9on  from  
    ResourceManager  

    View Slide

  17. •  One  unified  cluster  per  datacenter  
    •  YARN  and  Mesos  tasks  co-­‐exist  on  Nodes  
    •  Provision  resources  on  demand.  
    Future Vision
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Control  Plane  
    Node   Node   Node   Node   Node   Node  

    View Slide

  18. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  Y  
    6  CPU  
    6  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    Slave  
    Mesos  
    Slave  
    Mesos  

    View Slide

  19. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    M R
    A
    User  Job  
    Slave  
    Mesos  
    Slave  
    Mesos  
    Service  Y  
    6  CPU  
    6  GB  

    View Slide

  20. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  Y  
    6  CPU  
    6  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    M R
    User  Job  
    A
    Slave  
    Mesos  
    Slave  
    Mesos  

    View Slide

  21. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  Y  
    6  CPU  
    6  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    M R
    User  Job  
    A
    Detects  starva9on  
    Slave  
    Mesos  
    Slave  
    Mesos  

    View Slide

  22. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    M R
    User  Job  
    A
    Slave  
    Mesos  
    Slave  
    Mesos  
    Preempt  Y  

    View Slide

  23. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    M R
    User  Job  
    A
    Slave  
    Mesos  
    Slave  
    Mesos  
    Launch  NodeManager  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  

    View Slide

  24. Scenario 2: Scaling YARN
    Resource  
    Manager  
    YARN  
    Framework  
    +  
    master  
    Mesos  
    Node  1  
    Control  Plane  
    Node  2  
    Service  X  
    8  CPU  /  8  GB   8  CPU  /  8  GB  
    2  CPU  
    2  GB  
    Service  X  
    2  CPU  
    2  GB  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    User  Job  
    A
    Slave  
    Mesos  
    Slave  
    Mesos  
    4  CPU  
    4  GB  
    Node  
    Manager  
    YARN  
    Schedule  M  and  R  
    M
    R

    View Slide

  25. New YARN API (YARN-­‐2408)
    •  Resource  Requests  snapshot  API:  
    •  Memory  
    •  Virtual  Cores  
    •  Locality  constraint  
    •  REST  API  with  JSON  &  XML  output  
    •  Non-­‐intrusive  –  simply  exposes  
    more  informa9on  from  the  
    Resource  Manager  
    •  Helps  control  plane  decide  
    NodeManager  sizing  
    !
    96256!
    94!
    !
    application_!
    appattempt_!
    default!
    96256!
    94!
    3!
    !
    !
    1024!
    1!
    /default-rack!
    94!
    true!
    20!
    !
    !
    1024!
    1!
    *!
    94!
    true!
    20!
    !
    !
    1024!
    1!
    master!
    94!
    true!
    20!
    !
    !
    !
    !

    View Slide

  26. Design  scope:  
    •  Flex  up  or  down,  ver9cally  or  horizontally?  
    •  Determining  NodeManager  profile  for  flex  Up  
    –  Small  (2  CPU,  4  GB  RAM),  OR  Large  (8  CPU,  24  GB  RAM)  
    •  Choosing  NodeManager(s)  to  flex  down,  
    avoiding  ones  
    –  which  runs  AppMaster  container  
    –  whose  child  containers  are  cri9cal  (ex:  HBase  zone  servers)  
    Control Plane (Mesos Framework?)

    View Slide

  27. Thanks!

    View Slide

  28. Sample  Aurora  Job  
    #imports  
    pre_cleanup  =  Process(...)  
    make_cgroups_dir  =  Process(...)  
     
    configure_cgroups  =  Process(name  =  'configure_cgroups’,  cmdline  =  "MY_TASK_ID=`pwd  |  awk  -­‐F'/'  '{  print  $
    (NF-­‐1)  }'`  &&  echo  'hadoop'  |  sudo  -­‐S  sed  -­‐i  \"s@mesos.*/hadoop-­‐yarn@mesos/$MY_TASK_ID/hadoop-­‐yarn@g\"  /usr/
    local/hadoop/etc/hadoop/yarn-­‐site.xml")  
     
    start  =  Process(name  =  'start’,  cmdline  =  "source  %s;  %s  start  nodemanager;  sleep  10;"  %  (BASHRC,  
    YARN_DAEMONS))  
     
    monitor  =  Process(name  =  'monitor’,  cmdline  =  "sleep  10;  PID=`cat  /tmp/yarn-­‐hduser-­‐nodemanager.pid`;  echo  
    'Monitoring  nodemanager  pid:  '  ${PID};  while  [  -­‐e  /proc/${PID}  ];  do  sleep  1;  done")  
     
    stop  =  Process(name  =  'stop',final  =  True,  cmdline  =  "source  %s;  %s  stop  nodemanager"  %  (BASHRC,  
    YARN_DAEMONS))  
     
    template_task  =  Task(  
       processes  =  [pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor,  stop],  
       constraints  =  order(pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor)  +  order(stop)  
    )  
     
    small_task  =  template_task(name  =  'small_task’,  resources  =  Resources(cpu=1.0,  ram=512*MB,  disk=2048*MB))  
     
    large_task  =  template_task(name  =  'large_task’,  resources  =  Resources(cpu=2.0,  ram=2048*MB,  disk=2048*MB))  
     
    jobs  =  [Service(task  =  large_task,  instances  =  instances,  cluster  =  'devcluster',  
           role  =  ROLE,  environment  =  'devel',  name  =  'yarnlarge’)]  
     
    #  Job  config,  for  a  small  task.  
    #small_jobs  =  [Service(task  =  small_task,  instances  =  instances,  cluster  =  'devcluster',  
           role  =  ROLE,  environment  =  'devel',  name  =  'yarnsmall’)]  
     

    View Slide