Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stateful Data-Parallel Processing

Stateful Data-Parallel Processing

Raul Castro Fernandez, Sr, Researcher @Imperial College London, talk at @d_ldn meetup

Data Science London

June 04, 2015
Tweet

More Decks by Data Science London

Other Decks in Science

Transcript

  1. 2   ?   High-­‐Level   programs   Dataflows  

    Scalable  and  Fault-­‐ Tolerant  execu@on  
  2. 3   Func@onal  or   declara@ve   oriented  programs  

    High-­‐Level   programs   Stateless  dataflow   graphs   Scalable  and  Fault-­‐ Tolerant  execu@on   in  large-­‐scale   distributed  systems  
  3. 4   Func@onal  or   declara@ve   oriented  programs  

    Stateless  dataflow   graphs   Scalable  and  Fault-­‐ Tolerant  execu@on   in  large-­‐scale   distributed  systems   Matlab   Java   Python   R  
  4. 5   Decision  makers,  domain  scien@sts,  applica@on  users,   journalists,

     crowd  workers,  and  everyday  consumers,  sales,   marke@ng…  they  all  benefit  from  insights  derived  from  data.   Democratization of Data Developers  and  DBAs  are  no  longer  the  only  ones   genera@ng,  processing  and  analyzing  data.    
  5. 6   First  step  to  democra@ze  Big  Data:   to

     offer  a  familiar  programming  interface  
  6. Mutable State in a Recommender System 7   Matrix  userItem

     =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)  
  7. Mutable State in a Recommender System 8   Matrix  userItem

     =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRaNng(int  user,  int  item,  int  raNng)  {            userItem.setElement(user,  item,  ra@ng);          updateCoOccurrence(coOcc,  userItem);   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra@ngs  
  8. Mutable State in a Recommender System 9   Matrix  userItem

     =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRaNng(int  user,  int  item,  int  raNng)  {            userItem.setElement(user,  item,  ra@ng);          updateCoOccurrence(coOcc,  userItem);   }   Vector  getRec(int  user)  {          Vector  userRow  =  userItem.getRow(user);          Vector  userRec  =  coOcc.mul@ply(userRow);            return  userRec;   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra@ngs   Mul@ply  for   recommenda@on   User-­‐B   1   2   x  
  9. 10   Challenges When Executing with Big Data Big  Data

     Problem:   Matrices   become  large   >  Mutable  state  leads  to  concise  algorithms  but   complicates  parallelism  and  fault  tolerance   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   >  Cannot  lose  state  aTer  failure   >  Need  to  manage  state  to  support  data-­‐parallelism  
  10. 11   Using Current Distributed Dataflow Frameworks Input   data

      Output   data   >  No  mutable  state  simplifies  fault  tolerance   >  MapReduce:  Map  and  Reduce  tasks   >  Storm:  No  support  for  state   >  Spark:  Immutable  RDDs  
  11. 12   >  Programming  distributed  dataflow  graphs   requires  learning

     new  programming  models   Imperative Big Data Processing
  12. 13   Our  Goal:   Run  Java  programs  with  mutable

     state  but  with     performance  and  fault  tolerance  of     distributed  dataflow  systems   >  Programming  distributed  dataflow  graphs   requires  learning  new  programming  models   Imperative Big Data Processing
  13. 14   >  @AnnotaNons  help  with  translaNon  from  Java  to

     SDGs   >  Mutable  distributed  state  in  dataflow  graphs   Stateful Dataflow Graphs: From Imperative Programs to Distributed Dataflows Program.java   SDGs:  Stateful  Dataflow  Graphs   >  Checkpoint-­‐based  fault  tolerance  recovers  mutable  state     aTer  failure  
  14. •  Mo@va@on   •  SDG:  Stateful  Dataflow  Graphs   • 

    Handling  distributed  state  in  SDGs   •  Transla@ng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   15   Outline Program.java  
  15. SDG: Data, State and Computation >  SDGs  separate  data  and

     state   to  allow  data  and  pipeline  parallelism   16   Task  Elements  (TEs)   process  data   State  Elements  (SEs)   represent  state   Dataflows   represent     data   >  Task  Elements  have  local  access  to  State  Elements  
  16. State  Elements  support  two  abstrac@ons  for   distributed  mutable  state

      –  ParNNoned  SEs:  task  elements  always  access   state  by  key   –  ParNal  SEs:  task  elements  can  access     complete  state   17   Distributed Mutable State
  17. 18   Distributed Mutable State: Partitioned SEs Dataflow  routed  according

     to     hash  func@on   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Access   by  key   State  par@@oned  according   to  parNNoning  key   >  ParNNoned  SEs  split  into  disjoint  par@@ons   User-­‐Item  matrix  (UI)   hash(msg.id)   Key  space:  [0-­‐N]   [0-­‐k]   [(k+1)-­‐N]  
  18. 19   Distributed Mutable State: Partial SEs Local  access:  

    Data  sent  to  one   Global  access:   Data  sent  to  all   >  ParNal  SE  gives  nodes  local  state  instances   >  ParNal  SE  access  by  TEs  can  be  local  or  global  
  19. 20   Merging Distributed Mutable State Merge  logic   >

     Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  
  20. 21   Merging Distributed Mutable State Mul@ple   par@al  values

      Merge  logic   >  Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  
  21. 22   Merging Distributed Mutable State Mul@ple   par@al  values

      Collect  par@al   values   Merge  logic   >  Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  
  22. 23   Outline >  @AnnotaNons   •  Mo@va@on   • 

    SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  TranslaNng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   Program.java  
  23. 24   From Imperative Code to Execution SEEP   Annotated

        program   >  SEEP:  data-­‐parallel  processing  placorm   •  Transla@on  occurs  in  two  stages:   –  Sta-c  code  analysis:  From  Java  to  SDG   –  Bytecode  rewri-ng:  From  SDG  to  SEEP  [SIGMOD’13]   Program.java  
  24. Program.java   25   Extract  TEs,  SEs   and  accesses

      Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   SOOT   Framework   Javassist   >  Extract  state  and  state  access  paderns  through  sta@c  code  analysis   >  Genera@on  of  runnable  code  using  TE  and  SE  connec@ons   Translation Process Janino  
  25. Program.java   26   Extract  TEs,  SEs   and  accesses

      Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   Javassist   >  Extract  state  and  state  access  paderns  through  sta@c  code  analysis   >  Genera@on  of  runnable  code  using  TE  and  SE  connec@ons   Translation Process Annotated     Program.java   SOOT   Framework   Janino  
  26. 27   @ParNNoned  Matrix  userItem  =  new  SeepMatrix();   Matrix

     coOcc  =  new  Matrix();     void  addRa@ng(int  user,  int  item,  int  ra@ng)  {        userItem.setElement(user,  item,  ra@ng);      updateCoOccurrence(coOcc,  userItem);   }     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      Vector  userRec  =  coOcc.mul@ply(userRow);        return  userRec;   }     Partitioned State Annotation >  @ParNNon  field  annotaNon  indicates  par--oned  state   hash(msg.id)  
  27. 28   @Par@@oned  Matrix  userItem  =  new  SeepMatrix();   @ParNal

     Matrix  coOcc  =  new  SeepMatrix();     void  addRa@ng(int  user,  int  item,  int  ra@ng)  {        userItem.setElement(user,  item,  ra@ng);      updateCoOccurrence(@Global  coOcc,  userItem);   }   Partial State and Global Annotations >  @Global  annotates  variable  to  indicate   access  to  all  par@al  instances   >  @ParNal  field  annotaNon  indicates  par-al  state  
  28. 29   @Par@@oned  Matrix  userItem  =  new  SeepMatrix();   @ParNal

     Matrix  coOcc  =  new  SeepMatrix();     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      @ParNal  Vector  puRec  =  @Global  coOcc.mul@ply(userRow);        Vector  userRec  =  merge(puRec);      return  userRec;   }     Vector  merge(@CollecNon  Vector[]  v){      /*…*/   }     Partial and Collection Annotations >  @CollecNon  annota@on  indicates  merge  logic  
  29. 30   Outline >  Failures   •  Mo@va@on   • 

    SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla@ng  Java  programs  to  SDGs   •  Checkpoint-­‐Based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   Program.java  
  30. 31   Challenges of Making SDGs Fault Tolerant Physical  deployment

     of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state  
  31. 32   Challenges of Making SDGs Fault Tolerant RAM  

    RAM   Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes  
  32. 33   RAM   RAM   Physical  deployment  of  SDG

      >  Node  failures  may     lead  to  state  loss   CheckpoinNng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin@ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant
  33. 34   RAM   RAM   Physical  deployment  of  SDG

      •  Backups  large  and  cannot  be   stored  in  memory   •  Large  writes  to  disk  through   network  have  high  cost   State  Backup   >  Node  failures  may     lead  to  state  loss   CheckpoinNng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin@ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant
  34. 35   Checkpoint Mechanism for Fault Tolerance 1.  Freeze  mutable

     state  for  checkpoin@ng   2.  Dirty  state  supports  updates  concurrently   3.  Reconcile  dirty  state   Asynchronous,  lock-­‐free  checkpoinNng     Dirty  state  
  35. 36   Distributed M to N Checkpoint Backup M  to

     N  distributed  backup  and   parallel  recovery  
  36. 37   Distributed M to N Checkpoint Backup M  to

     N  distributed  backup  and   parallel  recovery  
  37. 38   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  38. 39   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  39. 40   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  40. 41   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  41. 42   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  42. 43   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  43. 44   M  to  N  distributed  backup  and   parallel

     recovery   Distributed M to N Checkpoint Backup
  44. How  does  mutable  state  impact  performance?   How  efficient  are

     translated  SDGs?   What  is  the  throughput/latency  trade-­‐off?   Experimental  set-­‐up:   –  Amazon  EC2  (c1  and  m1  xlarge  instances)   –  Private  cluster  (4-­‐core  3.4  GHz  Intel  Xeon  servers  with  8  GB  RAM  )   –  Sun  Java  7,  Ubuntu  12.04,  Linux  kernel  3.10   45   Evaluation of SDG Performance
  45. 46   0 5 10 15 20 1:5 1:2 1:1

    2:1 5:1 100 1000 Throughput (1000 requests/s) Latency (ms) Workload (state read/write ratio) Throughput Latency Combines  batch  and  online  processing  to  serve  fresh   results  over  large  mutable  state   Processing with Large Mutable State >  addRa@ng  and  getRec  func@ons  from  recommender   algorithm,  while  changing  read/write  ra@o  
  46. 47   0 10 20 30 40 50 60 25

    50 75 100 Throughput (GB/s) Number of nodes SDG Spark Translated  SDG  achieves  performance     similar  to  non-­‐mutable  dataflow   >  Batch-­‐oriented,  itera@ve  logis@c  regression   Efficiency of Translated SDG
  47. 48   SDGs  achieve  high  throughput  while  mainNng  low  latency

      Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency
  48. 49   SDGs  achieve  high  throughput  while  mainNng  low  latency

      Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark
  49. 50   SDGs  achieve  high  throughput  while  mainNng  low  latency

      Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark Naiad-LowLatency
  50. Running  Java  programs  with  the  performance  of   current  distributed

     dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac@ons  for  distributed  mutable  state   – AnnotaNons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   51   Summary
  51. Running  Java  programs  with  the  performance  of   current  distributed

     dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac@ons  for  distributed  mutable  state   – AnnotaNons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   52   Summary Thank  you!   Any  QuesNons?   @raulcfernandez   [email protected]   h)ps://github.com/raulcf/SEEPng/