Slide 1

Slide 1 text

M e s o s Running YARN alongside Mohit Soni ebay inc Renan DelValle SUNY Binghamton

Slide 2

Slide 2 text

•  Cluster  manager   •  Two  Level  Scheduler   •  Supports  service  and  analy9cal  jobs   About Mesos Mesos   Node   Node   Node   Node   Marathon   MPI   Spark   Aurora  

Slide 3

Slide 3 text

•  Resource  Nego9ator   •  Single  level  scheduler   •  Supports  different  types  of  analy9cal  jobs   About YARN

Slide 4

Slide 4 text

•  Independent  resource  managers  sta9cally  par99ons  datacenter   •  Mesos  supports  long  running  services  and  analy9cs  workloads   •  YARN  ecosystem  is  currently  around  analy9cs/data  processing     Goal   Share  resources  between  YARN  and  Mesos,  with  Mesos  being  the  resource  manager   for  the  data  center.     Solu'on  Characteris'cs   •  Non-­‐intrusive,  avoids  modifying  Mesos  or  YARN  protocols   •  Easy  future  upgrades   •  Easier  cer9fica9on  path   •  Use  YARN’s  scheduling  data,  for  providing  &  rescinding  resources  to  YARN   Problem Statement

Slide 5

Slide 5 text

YARN  Architecture  Overview   Resource   Manager   YARN   Node   Node   Manager   YARN   Node   Node   Manager   YARN   AppMaster   Container   Container   Container  

Slide 6

Slide 6 text

How it works Resource   Manager   YARN   Framework   +   Master   Mesos   Node   Mesos   Slave   Mesos   Control  Plane  

Slide 7

Slide 7 text

How it works Resource   Manager   YARN   Framework   +   Master   Mesos   Node   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   2.5  CPU   2.5  GB   Launch     NodeManager  

Slide 8

Slide 8 text

How it works Resource   Manager   YARN   Framework   +   Master   Mesos   Node   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   2.5  CPU   2.5  GB   2.0  CPU   2.0  GB  

Slide 9

Slide 9 text

How it works Resource   Manager   YARN   Framework   +   Master   Mesos   Node   C1   C2   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   Launch  containers  

Slide 10

Slide 10 text

How it works cgroups  hierarchy   /sys/fs/cgroup/cpu/mesos   /   /C1   /C2   Resource   Manager   YARN   Framework   +   Master   Mesos   Node   C1   C2   Mesos   Slave   Mesos   Node   Manager   YARN   Control  Plane   /hadoop-­‐yarn  

Slide 11

Slide 11 text

Scenario 1: Handling Traffic Spikes Resource   Manager   YARN   Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8  CPU   8  GB   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Traffic  spike  for   ebay.com/sch   Node   Manager   YARN   Node   Manager   YARN   8  CPU   8  GB   Slave   Mesos   Slave   Mesos  

Slide 12

Slide 12 text

Scenario 1: Handling Traffic Spikes Resource   Manager   YARN   Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Traffic  spike  for   ebay.com/sch   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   Resize  NodeManager   6  CPU   6  GB   Node   Manager   YARN  

Slide 13

Slide 13 text

Scenario 1: Handling Traffic Spikes Resource   Manager   YARN   Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Service   sch   2  CPU   2  GB   Traffic  spike  for   ebay.com/sch   Service   sch   2  CPU   2  GB   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   Deploy  sch   6  CPU   6  GB   Node   Manager   YARN  

Slide 14

Slide 14 text

Scenario 1: Handling Traffic Spikes Resource   Manager   YARN   Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Service   sch   2  CPU   2  GB   ebay.com/sch   survives   Service   sch   2  CPU   2  GB   6  CPU   6  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos   6  CPU   6  GB   Node   Manager   YARN  

Slide 15

Slide 15 text

Scenario 1: Handling Traffic Spikes Resource   Manager   YARN   Framework   +   Master   Mesos   Hadoop  cluster  Node  1   Control  Plane   Hadoop  cluster  Node  2   8  CPU   8  GB   8.5  CPU  /  8.5  GB   8.5  CPU  /  8.5  GB   Node   Manager   YARN   Node   Manager   YARN   8  CPU   8  GB   Slave   Mesos   Slave   Mesos   Restore  cluster  

Slide 16

Slide 16 text

YARN Pending Improvements •  Restar9ng  NodeManager,  kills  child  containers   (YARN-­‐1336)   •  Restar9ng  AppMaster,  kills  child  containers  across   all  nodes  (YARN-­‐1489)   •  NodeManager’s  pending  support  for  cgroups   memory  subsystem   •  Geeng  richer  scheduling  informa9on  from   ResourceManager  

Slide 17

Slide 17 text

•  One  unified  cluster  per  datacenter   •  YARN  and  Mesos  tasks  co-­‐exist  on  Nodes   •  Provision  resources  on  demand.   Future Vision Resource   Manager   YARN   Framework   +   master   Mesos   Control  Plane   Node   Node   Node   Node   Node   Node  

Slide 18

Slide 18 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   Slave   Mesos   Slave   Mesos  

Slide 19

Slide 19 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R A User  Job   Slave   Mesos   Slave   Mesos   Service  Y   6  CPU   6  GB  

Slide 20

Slide 20 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos  

Slide 21

Slide 21 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  Y   6  CPU   6  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Detects  starva9on   Slave   Mesos   Slave   Mesos  

Slide 22

Slide 22 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos   Preempt  Y  

Slide 23

Slide 23 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   M R User  Job   A Slave   Mesos   Slave   Mesos   Launch  NodeManager   4  CPU   4  GB   Node   Manager   YARN  

Slide 24

Slide 24 text

Scenario 2: Scaling YARN Resource   Manager   YARN   Framework   +   master   Mesos   Node  1   Control  Plane   Node  2   Service  X   8  CPU  /  8  GB   8  CPU  /  8  GB   2  CPU   2  GB   Service  X   2  CPU   2  GB   4  CPU   4  GB   Node   Manager   YARN   User  Job   A Slave   Mesos   Slave   Mesos   4  CPU   4  GB   Node   Manager   YARN   Schedule  M  and  R   M R

Slide 25

Slide 25 text

New YARN API (YARN-­‐2408) •  Resource  Requests  snapshot  API:   •  Memory   •  Virtual  Cores   •  Locality  constraint   •  REST  API  with  JSON  &  XML  output   •  Non-­‐intrusive  –  simply  exposes   more  informa9on  from  the   Resource  Manager   •  Helps  control  plane  decide   NodeManager  sizing   ! 96256! 94! ! application_! appattempt_! default! 96256! 94! 3! ! ! 1024! 1! /default-rack! 94! true! 20! ! ! 1024! 1! *! 94! true! 20! ! ! 1024! 1! master! 94! true! 20! ! ! ! !

Slide 26

Slide 26 text

Design  scope:   •  Flex  up  or  down,  ver9cally  or  horizontally?   •  Determining  NodeManager  profile  for  flex  Up   –  Small  (2  CPU,  4  GB  RAM),  OR  Large  (8  CPU,  24  GB  RAM)   •  Choosing  NodeManager(s)  to  flex  down,   avoiding  ones   –  which  runs  AppMaster  container   –  whose  child  containers  are  cri9cal  (ex:  HBase  zone  servers)   Control Plane (Mesos Framework?)

Slide 27

Slide 27 text

Thanks!

Slide 28

Slide 28 text

Sample  Aurora  Job   #imports   pre_cleanup  =  Process(...)   make_cgroups_dir  =  Process(...)     configure_cgroups  =  Process(name  =  'configure_cgroups’,  cmdline  =  "MY_TASK_ID=`pwd  |  awk  -­‐F'/'  '{  print  $ (NF-­‐1)  }'`  &&  echo  'hadoop'  |  sudo  -­‐S  sed  -­‐i  \"s@mesos.*/hadoop-­‐yarn@mesos/$MY_TASK_ID/hadoop-­‐yarn@g\"  /usr/ local/hadoop/etc/hadoop/yarn-­‐site.xml")     start  =  Process(name  =  'start’,  cmdline  =  "source  %s;  %s  start  nodemanager;  sleep  10;"  %  (BASHRC,   YARN_DAEMONS))     monitor  =  Process(name  =  'monitor’,  cmdline  =  "sleep  10;  PID=`cat  /tmp/yarn-­‐hduser-­‐nodemanager.pid`;  echo   'Monitoring  nodemanager  pid:  '  ${PID};  while  [  -­‐e  /proc/${PID}  ];  do  sleep  1;  done")     stop  =  Process(name  =  'stop',final  =  True,  cmdline  =  "source  %s;  %s  stop  nodemanager"  %  (BASHRC,   YARN_DAEMONS))     template_task  =  Task(      processes  =  [pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor,  stop],      constraints  =  order(pre_cleanup,  make_cgroups_dir,  configure_cgroups,  start,  monitor)  +  order(stop)   )     small_task  =  template_task(name  =  'small_task’,  resources  =  Resources(cpu=1.0,  ram=512*MB,  disk=2048*MB))     large_task  =  template_task(name  =  'large_task’,  resources  =  Resources(cpu=2.0,  ram=2048*MB,  disk=2048*MB))     jobs  =  [Service(task  =  large_task,  instances  =  instances,  cluster  =  'devcluster',          role  =  ROLE,  environment  =  'devel',  name  =  'yarnlarge’)]     #  Job  config,  for  a  small  task.   #small_jobs  =  [Service(task  =  small_task,  instances  =  instances,  cluster  =  'devcluster',          role  =  ROLE,  environment  =  'devel',  name  =  'yarnsmall’)]