Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Running YARN alongside Mesos - MesosCon 2014

mohit
August 21, 2014

Running YARN alongside Mesos - MesosCon 2014

Almost anyone considering using Mesos in their datacenter, will come across a very interesting challenge, which is of running Mesos alongside YARN. Mesos and YARN are both resource managers, each with a unique architecture, protocol and ecosystem around them. Today, running these two alongside statically partitions the datacenter, something that both managers are designed to avoid, when they are the only resource manager in control. The slidedeck presents a novel solution that will allow Mesos and YARN to co-exist and share resources, without partitioning the datacenter.

mohit

August 21, 2014
Tweet

More Decks by mohit

Other Decks in Programming

Transcript

  1. M e s o s Running YARN alongside Mohit Soni

    ebay inc Renan DelValle SUNY Binghamton
  2. •  Cluster  manager   •  Two  Level  Scheduler   • 

    Supports  service  and  analy9cal  jobs   About Mesos Mesos   Node   Node   Node   Node   Marathon   MPI   Spark   Aurora  
  3. •  Resource  Nego9ator   •  Single  level  scheduler   • 

    Supports  different  types  of  analy9cal  jobs   About YARN
  4. •  Independent  resource  managers  sta9cally  par99ons  datacenter   •  Mesos

     supports  long  running  services  and  analy9cs  workloads   •  YARN  ecosystem  is  currently  around  analy9cs/data  processing     Goal   Share  resources  between  YARN  and  Mesos,  with  Mesos  being  the  resource  manager   for  the  data  center.     Solu'on  Characteris'cs   •  Non-­‐intrusive,  avoids  modifying  Mesos  or  YARN  protocols   •  Easy  future  upgrades   •  Easier  cer9fica9on  path   •  Use  YARN’s  scheduling  data,  for  providing  &  rescinding  resources  to  YARN   Problem Statement
  5. YARN  Architecture  Overview   Resource   Manager   YARN  

    Node   Node   Manager   YARN   Node   Node   Manager   YARN   AppMaster   Container   Container   Container  
  6. How it works Resource   Manager   YARN   Framework

      +   Master   Mesos   Node   Mesos   Slave   Mesos   Control  Plane  
  7. How it works Resource   Manager   YARN   Framework

      +   Master   Mesos   Node   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   2.5  CPU   2.5  GB   Launch     NodeManager  
  8. How it works Resource   Manager   YARN   Framework

      +   Master   Mesos   Node   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   2.5  CPU   2.5  GB   2.0  CPU   2.0  GB  
  9. How it works Resource   Manager   YARN   Framework

      +   Master   Mesos   Node   C1   C2   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   Launch  containers  
  10. How it works cgroups  hierarchy   /sys/fs/cgroup/cpu/mesos   /<mesos-­‐id>  

    /C1   /C2   Resource   Manager   YARN   Framework   +   Master   Mesos   Node   C1   C2   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   /hadoop-­‐yarn  
  11. Scenario 1: Handling Traffic Spikes Resource   Manager   YARN

      Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8  CPU   8  GB   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Traffic  spike  for   ebay.com/sch   Node   Manager   YARN   Node   Manager   YARN   8  CPU   8  GB   Slave   Mesos   Slave   Mesos  
  12. Scenario 1: Handling Traffic Spikes Resource   Manager   YARN

      Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Traffic  spike  for   ebay.com/sch   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   Resize  NodeManager   6  CPU   6  GB   Node   Manager   YARN  
  13. Scenario 1: Handling Traffic Spikes Resource   Manager   YARN

      Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Service   sch   2  CPU   2  GB   Traffic  spike  for   ebay.com/sch   Service   sch   2  CPU   2  GB   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   Deploy  sch   6  CPU   6  GB   Node   Manager   YARN  
  14. Scenario 1: Handling Traffic Spikes Resource   Manager   YARN

      Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Service   sch   2  CPU   2  GB   ebay.com/sch   survives   Service   sch   2  CPU   2  GB   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   6  CPU   6  GB   Node   Manager   YARN  
  15. Scenario 1: Handling Traffic Spikes Resource   Manager   YARN

      Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8  CPU   8  GB   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Node   Manager   YARN   Node   Manager   YARN   8  CPU   8  GB   Slave   Mesos   Slave   Mesos   Restore  cluster  
  16. YARN Pending Improvements •  Restar9ng  NodeManager,  kills  child  containers  

    (YARN-­‐1336)   •  Restar9ng  AppMaster,  kills  child  containers  across   all  nodes  (YARN-­‐1489)   •  NodeManager’s  pending  support  for  cgroups   memory  subsystem   •  Geeng  richer  scheduling  informa9on  from   ResourceManager  
  17. •  One  unified  cluster  per  datacenter   •  YARN  and

     Mesos  tasks  co-­‐exist  on  Nodes   •  Provision  resources  on  demand.   Future Vision Resource   Manager   YARN   Framework   +   master   Mesos   Control  Plane   Node   Node   Node   Node   Node   Node  
  18. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos  
  19. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R A User  Job   Slave   Mesos   Slave   Mesos   Service  Y   6  CPU   6  GB  
  20. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos  
  21. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Detects  starva9on   Slave   Mesos   Slave   Mesos  
  22. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos   Preempt  Y  
  23. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos   Launch  NodeManager   4  CPU   4  GB   Node   Manager   YARN  
  24. Scenario 2: Scaling YARN Resource   Manager   YARN  

    Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   User  Job   A Slave   Mesos   Slave   Mesos   4  CPU   4  GB   Node   Manager   YARN   Schedule  M  and  R   M R
  25. New YARN API (YARN-­‐2408) •  Resource  Requests  snapshot  API:  

    •  Memory   •  Virtual  Cores   •  Locality  constraint   •  REST  API  with  JSON  &  XML  output   •  Non-­‐intrusive  –  simply  exposes   more  informa9on  from  the   Resource  Manager   •  Helps  control  plane  decide   NodeManager  sizing   <resourceRequests>! <MB>96256</MB>! <VCores>94</VCores>! <appMaster>! <applicationId>application_</applicationId>! <applicationAttemptId>appattempt_</applicationAttemptId>! <queueName>default</queueName>! <totalPendingMB>96256</totalPendingMB>! <totalPendingVCores>94</totalPendingVCores>! <numResourceRequests>3</numResourceRequests>! <resourceRequests>! <request>! <MB>1024</MB>! <VCores>1</VCores>! <resourceName>/default-rack</resourceName>! <numContainers>94</numContainers>! <relaxLocality>true</relaxLocality>! <priority>20</priority>! </request>! <request>! <MB>1024</MB>! <VCores>1</VCores>! <resourceName>*</resourceName>! <numContainers>94</numContainers>! <relaxLocality>true</relaxLocality>! <priority>20</priority>! </request>! <request>! <MB>1024</MB>! <VCores>1</VCores>! <resourceName>master</resourceName>! <numContainers>94</numContainers>! <relaxLocality>true</relaxLocality>! <priority>20</priority>! </request>! </resourceRequests>! </appMaster>! </resourceRequests>!
  26. Design  scope:   •  Flex  up  or  down,  ver9cally  or

     horizontally?   •  Determining  NodeManager  profile  for  flex  Up   –  Small  (2  CPU,  4  GB  RAM),  OR  Large  (8  CPU,  24  GB  RAM)   •  Choosing  NodeManager(s)  to  flex  down,   avoiding  ones   –  which  runs  AppMaster  container   –  whose  child  containers  are  cri9cal  (ex:  HBase  zone  servers)   Control Plane (Mesos Framework?)
  27. Sample  Aurora  Job   #imports   pre_cleanup  =  Process(...)  

    make_cgroups_dir  =  Process(...)     configure_cgroups  =  Process(name  =  'configure_cgroups’,  cmdline  =  "MY_TASK_ID=`pwd  |  awk  -­‐F'/'  '{  print  $ (NF-­‐1)  }'`  &&  echo  'hadoop'  |  sudo  -­‐S  sed  -­‐i  \"s@mesos.*/hadoop-­‐yarn@mesos/$MY_TASK_ID/hadoop-­‐yarn@g\"  /usr/ local/hadoop/etc/hadoop/yarn-­‐site.xml")     start  =  Process(name  =  'start’,  cmdline  =  "source  %s;  %s  start  nodemanager;  sleep  10;"  %  (BASHRC,   YARN_DAEMONS))     monitor  =  Process(name  =  'monitor’,  cmdline  =  "sleep  10;  PID=`cat  /tmp/yarn-­‐hduser-­‐nodemanager.pid`;  echo   'Monitoring  nodemanager  pid:  '  ${PID};  while  [  -­‐e  /proc/${PID}  ];  do  sleep  1;  done")     stop  =  Process(name  =  'stop',final  =  True,  cmdline  =  "source  %s;  %s  stop  nodemanager"  %  (BASHRC,   YARN_DAEMONS))     template_task  =  Task(      processes  =  [pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor,  stop],      constraints  =  order(pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor)  +  order(stop)   )     small_task  =  template_task(name  =  'small_task’,  resources  =  Resources(cpu=1.0,  ram=512*MB,  disk=2048*MB))     large_task  =  template_task(name  =  'large_task’,  resources  =  Resources(cpu=2.0,  ram=2048*MB,  disk=2048*MB))     jobs  =  [Service(task  =  large_task,  instances  =  instances,  cluster  =  'devcluster',          role  =  ROLE,  environment  =  'devel',  name  =  'yarnlarge’)]     #  Job  config,  for  a  small  task.   #small_jobs  =  [Service(task  =  small_task,  instances  =  instances,  cluster  =  'devcluster',          role  =  ROLE,  environment  =  'devel',  name  =  'yarnsmall’)]