Slide 1

Slide 1 text

Stateful Data-Parallel Processing Raul  Castro  Fernandez   Imperial  College  London   [email protected]   @raulcfernandez   21  April  2015  

Slide 2

Slide 2 text

2   ?   High-­‐Level   programs   Dataflows   Scalable  and  Fault-­‐ Tolerant  execu@on  

Slide 3

Slide 3 text

3   Func@onal  or   declara@ve   oriented  programs   High-­‐Level   programs   Stateless  dataflow   graphs   Scalable  and  Fault-­‐ Tolerant  execu@on   in  large-­‐scale   distributed  systems  

Slide 4

Slide 4 text

4   Func@onal  or   declara@ve   oriented  programs   Stateless  dataflow   graphs   Scalable  and  Fault-­‐ Tolerant  execu@on   in  large-­‐scale   distributed  systems   Matlab   Java   Python   R  

Slide 5

Slide 5 text

5   Decision  makers,  domain  scien@sts,  applica@on  users,   journalists,  crowd  workers,  and  everyday  consumers,  sales,   marke@ng…  they  all  benefit  from  insights  derived  from  data.   Democratization of Data Developers  and  DBAs  are  no  longer  the  only  ones   genera@ng,  processing  and  analyzing  data.    

Slide 6

Slide 6 text

6   First  step  to  democra@ze  Big  Data:   to  offer  a  familiar  programming  interface  

Slide 7

Slide 7 text

Mutable State in a Recommender System 7   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)  

Slide 8

Slide 8 text

Mutable State in a Recommender System 8   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRaNng(int  user,  int  item,  int  raNng)  {            userItem.setElement(user,  item,  ra@ng);          updateCoOccurrence(coOcc,  userItem);   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra@ngs  

Slide 9

Slide 9 text

Mutable State in a Recommender System 9   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRaNng(int  user,  int  item,  int  raNng)  {            userItem.setElement(user,  item,  ra@ng);          updateCoOccurrence(coOcc,  userItem);   }   Vector  getRec(int  user)  {          Vector  userRow  =  userItem.getRow(user);          Vector  userRec  =  coOcc.mul@ply(userRow);            return  userRec;   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra@ngs   Mul@ply  for   recommenda@on   User-­‐B   1   2   x  

Slide 10

Slide 10 text

10   Challenges When Executing with Big Data Big  Data  Problem:   Matrices   become  large   >  Mutable  state  leads  to  concise  algorithms  but   complicates  parallelism  and  fault  tolerance   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   >  Cannot  lose  state  aTer  failure   >  Need  to  manage  state  to  support  data-­‐parallelism  

Slide 11

Slide 11 text

11   Using Current Distributed Dataflow Frameworks Input   data   Output   data   >  No  mutable  state  simplifies  fault  tolerance   >  MapReduce:  Map  and  Reduce  tasks   >  Storm:  No  support  for  state   >  Spark:  Immutable  RDDs  

Slide 12

Slide 12 text

12   >  Programming  distributed  dataflow  graphs   requires  learning  new  programming  models   Imperative Big Data Processing

Slide 13

Slide 13 text

13   Our  Goal:   Run  Java  programs  with  mutable  state  but  with     performance  and  fault  tolerance  of     distributed  dataflow  systems   >  Programming  distributed  dataflow  graphs   requires  learning  new  programming  models   Imperative Big Data Processing

Slide 14

Slide 14 text

14   >  @AnnotaNons  help  with  translaNon  from  Java  to  SDGs   >  Mutable  distributed  state  in  dataflow  graphs   Stateful Dataflow Graphs: From Imperative Programs to Distributed Dataflows Program.java   SDGs:  Stateful  Dataflow  Graphs   >  Checkpoint-­‐based  fault  tolerance  recovers  mutable  state     aTer  failure  

Slide 15

Slide 15 text

•  Mo@va@on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla@ng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   15   Outline Program.java  

Slide 16

Slide 16 text

SDG: Data, State and Computation >  SDGs  separate  data  and  state   to  allow  data  and  pipeline  parallelism   16   Task  Elements  (TEs)   process  data   State  Elements  (SEs)   represent  state   Dataflows   represent     data   >  Task  Elements  have  local  access  to  State  Elements  

Slide 17

Slide 17 text

State  Elements  support  two  abstrac@ons  for   distributed  mutable  state   –  ParNNoned  SEs:  task  elements  always  access   state  by  key   –  ParNal  SEs:  task  elements  can  access     complete  state   17   Distributed Mutable State

Slide 18

Slide 18 text

18   Distributed Mutable State: Partitioned SEs Dataflow  routed  according  to     hash  func@on   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Access   by  key   State  par@@oned  according   to  parNNoning  key   >  ParNNoned  SEs  split  into  disjoint  par@@ons   User-­‐Item  matrix  (UI)   hash(msg.id)   Key  space:  [0-­‐N]   [0-­‐k]   [(k+1)-­‐N]  

Slide 19

Slide 19 text

19   Distributed Mutable State: Partial SEs Local  access:   Data  sent  to  one   Global  access:   Data  sent  to  all   >  ParNal  SE  gives  nodes  local  state  instances   >  ParNal  SE  access  by  TEs  can  be  local  or  global  

Slide 20

Slide 20 text

20   Merging Distributed Mutable State Merge  logic   >  Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  

Slide 21

Slide 21 text

21   Merging Distributed Mutable State Mul@ple   par@al  values   Merge  logic   >  Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  

Slide 22

Slide 22 text

22   Merging Distributed Mutable State Mul@ple   par@al  values   Collect  par@al   values   Merge  logic   >  Requires  applica@on-­‐specific  merge  logic   >  Reading  all  par@al  SE  instances  results  in   set  of  parNal  values  

Slide 23

Slide 23 text

23   Outline >  @AnnotaNons   •  Mo@va@on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  TranslaNng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   Program.java  

Slide 24

Slide 24 text

24   From Imperative Code to Execution SEEP   Annotated     program   >  SEEP:  data-­‐parallel  processing  placorm   •  Transla@on  occurs  in  two  stages:   –  Sta-c  code  analysis:  From  Java  to  SDG   –  Bytecode  rewri-ng:  From  SDG  to  SEEP  [SIGMOD’13]   Program.java  

Slide 25

Slide 25 text

Program.java   25   Extract  TEs,  SEs   and  accesses   Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   SOOT   Framework   Javassist   >  Extract  state  and  state  access  paderns  through  sta@c  code  analysis   >  Genera@on  of  runnable  code  using  TE  and  SE  connec@ons   Translation Process Janino  

Slide 26

Slide 26 text

Program.java   26   Extract  TEs,  SEs   and  accesses   Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   Javassist   >  Extract  state  and  state  access  paderns  through  sta@c  code  analysis   >  Genera@on  of  runnable  code  using  TE  and  SE  connec@ons   Translation Process Annotated     Program.java   SOOT   Framework   Janino  

Slide 27

Slide 27 text

27   @ParNNoned  Matrix  userItem  =  new  SeepMatrix();   Matrix  coOcc  =  new  Matrix();     void  addRa@ng(int  user,  int  item,  int  ra@ng)  {        userItem.setElement(user,  item,  ra@ng);      updateCoOccurrence(coOcc,  userItem);   }     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      Vector  userRec  =  coOcc.mul@ply(userRow);        return  userRec;   }     Partitioned State Annotation >  @ParNNon  field  annotaNon  indicates  par--oned  state   hash(msg.id)  

Slide 28

Slide 28 text

28   @Par@@oned  Matrix  userItem  =  new  SeepMatrix();   @ParNal  Matrix  coOcc  =  new  SeepMatrix();     void  addRa@ng(int  user,  int  item,  int  ra@ng)  {        userItem.setElement(user,  item,  ra@ng);      updateCoOccurrence(@Global  coOcc,  userItem);   }   Partial State and Global Annotations >  @Global  annotates  variable  to  indicate   access  to  all  par@al  instances   >  @ParNal  field  annotaNon  indicates  par-al  state  

Slide 29

Slide 29 text

29   @Par@@oned  Matrix  userItem  =  new  SeepMatrix();   @ParNal  Matrix  coOcc  =  new  SeepMatrix();     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      @ParNal  Vector  puRec  =  @Global  coOcc.mul@ply(userRow);        Vector  userRec  =  merge(puRec);      return  userRec;   }     Vector  merge(@CollecNon  Vector[]  v){      /*…*/   }     Partial and Collection Annotations >  @CollecNon  annota@on  indicates  merge  logic  

Slide 30

Slide 30 text

30   Outline >  Failures   •  Mo@va@on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla@ng  Java  programs  to  SDGs   •  Checkpoint-­‐Based  fault  tolerance  for  SDGs   •  Experimental  evalua@on   Program.java  

Slide 31

Slide 31 text

31   Challenges of Making SDGs Fault Tolerant Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state  

Slide 32

Slide 32 text

32   Challenges of Making SDGs Fault Tolerant RAM   RAM   Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes  

Slide 33

Slide 33 text

33   RAM   RAM   Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   CheckpoinNng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin@ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant

Slide 34

Slide 34 text

34   RAM   RAM   Physical  deployment  of  SDG   •  Backups  large  and  cannot  be   stored  in  memory   •  Large  writes  to  disk  through   network  have  high  cost   State  Backup   >  Node  failures  may     lead  to  state  loss   CheckpoinNng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin@ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant

Slide 35

Slide 35 text

35   Checkpoint Mechanism for Fault Tolerance 1.  Freeze  mutable  state  for  checkpoin@ng   2.  Dirty  state  supports  updates  concurrently   3.  Reconcile  dirty  state   Asynchronous,  lock-­‐free  checkpoinNng     Dirty  state  

Slide 36

Slide 36 text

36   Distributed M to N Checkpoint Backup M  to  N  distributed  backup  and   parallel  recovery  

Slide 37

Slide 37 text

37   Distributed M to N Checkpoint Backup M  to  N  distributed  backup  and   parallel  recovery  

Slide 38

Slide 38 text

38   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 39

Slide 39 text

39   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 40

Slide 40 text

40   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 41

Slide 41 text

41   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 42

Slide 42 text

42   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 43

Slide 43 text

43   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 44

Slide 44 text

44   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 45

Slide 45 text

How  does  mutable  state  impact  performance?   How  efficient  are  translated  SDGs?   What  is  the  throughput/latency  trade-­‐off?   Experimental  set-­‐up:   –  Amazon  EC2  (c1  and  m1  xlarge  instances)   –  Private  cluster  (4-­‐core  3.4  GHz  Intel  Xeon  servers  with  8  GB  RAM  )   –  Sun  Java  7,  Ubuntu  12.04,  Linux  kernel  3.10   45   Evaluation of SDG Performance

Slide 46

Slide 46 text

46   0 5 10 15 20 1:5 1:2 1:1 2:1 5:1 100 1000 Throughput (1000 requests/s) Latency (ms) Workload (state read/write ratio) Throughput Latency Combines  batch  and  online  processing  to  serve  fresh   results  over  large  mutable  state   Processing with Large Mutable State >  addRa@ng  and  getRec  func@ons  from  recommender   algorithm,  while  changing  read/write  ra@o  

Slide 47

Slide 47 text

47   0 10 20 30 40 50 60 25 50 75 100 Throughput (GB/s) Number of nodes SDG Spark Translated  SDG  achieves  performance     similar  to  non-­‐mutable  dataflow   >  Batch-­‐oriented,  itera@ve  logis@c  regression   Efficiency of Translated SDG

Slide 48

Slide 48 text

48   SDGs  achieve  high  throughput  while  mainNng  low  latency   Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency

Slide 49

Slide 49 text

49   SDGs  achieve  high  throughput  while  mainNng  low  latency   Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark

Slide 50

Slide 50 text

50   SDGs  achieve  high  throughput  while  mainNng  low  latency   Latency/Throughput Tradeoff >  Streaming  word  count  query,  repor@ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark Naiad-LowLatency

Slide 51

Slide 51 text

Running  Java  programs  with  the  performance  of   current  distributed  dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac@ons  for  distributed  mutable  state   – AnnotaNons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   51   Summary

Slide 52

Slide 52 text

Running  Java  programs  with  the  performance  of   current  distributed  dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac@ons  for  distributed  mutable  state   – AnnotaNons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   52   Summary Thank  you!   Any  QuesNons?   @raulcfernandez   [email protected]   h)ps://github.com/raulcf/SEEPng/