Slide 1

Slide 1 text

THE ABSTRACTION THAT POWERS THE BIG DATA RAÚL CASTRO FERNÁNDEZ COMPUTER SCIENCE PHD STUDENT IMPERIAL COLLEGE

Slide 2

Slide 2 text

Data!ows: The Abstraction that Powers Big Data Raul  Castro  Fernandez   Imperial  College  London   [email protected]   @raulcfernandez  

Slide 3

Slide 3 text

“Big  Data  needs  Democra:za:on”  

Slide 4

Slide 4 text

3   Developers  and  DBAs  are  no  longer  the  only  ones   genera:ng,  processing  and  analyzing  data.     Democratization of Data

Slide 5

Slide 5 text

4   Decision  makers,  domain  scien:sts,  applica:on  users,   journalists,  crowd  workers,  and  everyday  consumers,  sales,   marke:ng…   Democratization of Data Developers  and  DBAs  are  no  longer  the  only  ones   genera:ng,  processing  and  analyzing  data.    

Slide 6

Slide 6 text

5   +  Everyone  has  data  

Slide 7

Slide 7 text

6   +  Everyone  has  data   +  Many  have  interes:ng  ques:ons  

Slide 8

Slide 8 text

7   +  Everyone  has  data   +  Many  have  interes:ng  ques:ons   -­‐  Not  everyone  knows  how  to  analyze  it  

Slide 9

Slide 9 text

8   +  Everyone  has  data   +  Many  have  interes:ng  ques:ons   -­‐  Not  everyone  knows  how  to  analyze  it  

Slide 10

Slide 10 text

9   Bob   Local  Expert  

Slide 11

Slide 11 text

10   Bob   Local  Expert  

Slide 12

Slide 12 text

11   Bob   Local  Expert   -­‐  Barrier  of  human  communica:on   -­‐  Barrier  of  professional  rela:ons  

Slide 13

Slide 13 text

12   Bob   Local  Expert   -­‐  Barrier  of  human  communica:on   -­‐  Barrier  of  professional  rela:ons   The  limits  of  my  language  mean  the  limits  of  my   world.   Ludwig  WiWgenstein  “Tractatus  Logico-­‐Philosophicus  1922”  

Slide 14

Slide 14 text

13   First  step  to  democra:ze  Big  Data:   to  offer  a  familiar  programming  interface  

Slide 15

Slide 15 text

•  Mo>va>on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla:ng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua:on   14   Outline ?   ?  

Slide 16

Slide 16 text

Mutable State in a Recommender System 15   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)  

Slide 17

Slide 17 text

Mutable State in a Recommender System 16   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRa>ng(int  user,  int  item,  int  ra>ng)  {            userItem.setElement(user,  item,  ra:ng);          updateCoOccurrence(coOcc,  userItem);   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra:ngs  

Slide 18

Slide 18 text

Mutable State in a Recommender System 17   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   void  addRa>ng(int  user,  int  item,  int  ra>ng)  {            userItem.setElement(user,  item,  ra:ng);          updateCoOccurrence(coOcc,  userItem);   }   Vector  getRec(int  user)  {          Vector  userRow  =  userItem.getRow(user);          Vector  userRec  =  coOcc.mul:ply(userRow);            return  userRec;   }   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Item-­‐A   Item-­‐B   Item-­‐A   1   1   Item-­‐B   1   2   User-­‐Item  matrix  (UI)   Co-­‐Occurrence  matrix  (CO)   Update   with  new   ra:ngs   Mul:ply  for   recommenda:on   User-­‐B   1   2   x  

Slide 19

Slide 19 text

18   Challenges When Executing with Big Data Big  Data  Problem:   Matrices   become  large   >  Mutable  state  leads  to  concise  algorithms  but   complicates  parallelism  and  fault  tolerance   Matrix  userItem  =  new  Matrix();   Matrix  coOcc  =  new  Matrix();   >  Cannot  lose  state  aRer  failure   >  Need  to  manage  state  to  support  data-­‐parallelism  

Slide 20

Slide 20 text

19   Using Current Distributed Data"ow Frameworks Input   data   Output   data   >  No  mutable  state  simplifies  fault  tolerance   >  MapReduce:  Map  and  Reduce  tasks   >  Storm:  No  support  for  state   >  Spark:  Immutable  RDDs  

Slide 21

Slide 21 text

20   >  Programming  distributed  dataflow  graphs   requires  learning  new  programming  models   Imperative Big Data Processing

Slide 22

Slide 22 text

21   Our  Goal:   Run  Java  programs  with  mutable  state  but  with     performance  and  fault  tolerance  of     distributed  dataflow  systems   >  Programming  distributed  dataflow  graphs   requires  learning  new  programming  models   Imperative Big Data Processing

Slide 23

Slide 23 text

22   >  @Annota>ons  help  with  transla>on  from  Java  to  SDGs   >  Mutable  distributed  state  in  dataflow  graphs   Stateful Data"ow Graphs: From Imperative Programs to Distributed Data"ows Program.java   SDGs:  Stateful  Dataflow  Graphs   >  Checkpoint-­‐based  fault  tolerance  recovers  mutable  state     aRer  failure  

Slide 24

Slide 24 text

•  Mo:va:on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla:ng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua:on   23   Outline Program.java  

Slide 25

Slide 25 text

SDG: Data, State and Computation >  SDGs  separate  data  and  state   to  allow  data  and  pipeline  parallelism   24   Task  Elements  (TEs)   process  data   State  Elements  (SEs)   represent  state   Dataflows   represent     data   >  Task  Elements  have  local  access  to  State  Elements  

Slide 26

Slide 26 text

State  Elements  support  two  abstrac:ons  for   distributed  mutable  state   –  Par>>oned  SEs:  task  elements  always  access   state  by  key   –  Par>al  SEs:  task  elements  can  access     complete  state   25   Distributed Mutable State

Slide 27

Slide 27 text

26   Distributed Mutable State: Partitioned SEs Dataflow  routed  according  to     hash  func:on   Item-­‐A   Item-­‐B   User-­‐A   4   5   User-­‐B   0   5   Access   by  key   State  par::oned  according   to  par>>oning  key   >  Par>>oned  SEs  split  into  disjoint  par::ons   User-­‐Item  matrix  (UI)   hash(msg.id)   Key  space:  [0-­‐N]   [0-­‐k]   [(k+1)-­‐N]  

Slide 28

Slide 28 text

27   Distributed Mutable State: Partial SEs Local  access:   Data  sent  to  one   Global  access:   Data  sent  to  all   >  Par>al  SE  gives  nodes  local  state  instances   >  Par>al  SE  access  by  TEs  can  be  local  or  global  

Slide 29

Slide 29 text

28   Merging Distributed Mutable State Merge  logic   >  Requires  applica:on-­‐specific  merge  logic   >  Reading  all  par:al  SE  instances  results  in   set  of  par>al  values  

Slide 30

Slide 30 text

29   Merging Distributed Mutable State Mul:ple   par:al  values   Merge  logic   >  Requires  applica:on-­‐specific  merge  logic   >  Reading  all  par:al  SE  instances  results  in   set  of  par>al  values  

Slide 31

Slide 31 text

30   Merging Distributed Mutable State Mul:ple   par:al  values   Collect  par:al   values   Merge  logic   >  Requires  applica:on-­‐specific  merge  logic   >  Reading  all  par:al  SE  instances  results  in   set  of  par>al  values  

Slide 32

Slide 32 text

31   Outline >  @Annota>ons   •  Mo:va:on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla>ng  Java  programs  to  SDGs   •  Checkpoint-­‐based  fault  tolerance  for  SDGs   •  Experimental  evalua:on   Program.java  

Slide 33

Slide 33 text

32   From Imperative Code to Execution SEEP   Annotated     program   >  SEEP:  data-­‐parallel  processing  plaborm   •  Transla:on  occurs  in  two  stages:   –  Sta

Slide 34

Slide 34 text

Program.java   33   Extract  TEs,  SEs   and  accesses   Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   SOOT   Framework   Javassist   >  Extract  state  and  state  access  paderns  through  sta:c  code  analysis   >  Genera:on  of  runnable  code  using  TE  and  SE  connec:ons   Translation Process

Slide 35

Slide 35 text

Program.java   34   Extract  TEs,  SEs   and  accesses   Live  variable   analysis   TE  and  SE  access   code  assembly   SEEP  runnable   SOOT   Framework   Javassist   >  Extract  state  and  state  access  paderns  through  sta:c  code  analysis   >  Genera:on  of  runnable  code  using  TE  and  SE  connec:ons   Translation Process Annotated     Program.java  

Slide 36

Slide 36 text

35   @Par>>oned  Matrix  userItem  =  new  SeepMatrix();   Matrix  coOcc  =  new  Matrix();     void  addRa:ng(int  user,  int  item,  int  ra:ng)  {        userItem.setElement(user,  item,  ra:ng);      updateCoOccurrence(coOcc,  userItem);   }     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      Vector  userRec  =  coOcc.mul:ply(userRow);        return  userRec;   }     Partitioned State Annotation >  @Par>>on  field  annota>on  indicates  par<

Slide 37

Slide 37 text

36   @Par::oned  Matrix  userItem  =  new  SeepMatrix();   @Par>al  Matrix  coOcc  =  new  SeepMatrix();     void  addRa:ng(int  user,  int  item,  int  ra:ng)  {        userItem.setElement(user,  item,  ra:ng);      updateCoOccurrence(@Global  coOcc,  userItem);   }   Partial State and Global Annotations >  @Global  annotates  variable  to  indicate   access  to  all  par:al  instances   >  @Par>al  field  annota>on  indicates  par

Slide 38

Slide 38 text

37   @Par::oned  Matrix  userItem  =  new  SeepMatrix();   @Par>al  Matrix  coOcc  =  new  SeepMatrix();     Vector  getRec(int  user)  {      Vector  userRow  =  userItem.getRow(user);      @Par>al  Vector  puRec  =  @Global  coOcc.mul:ply(userRow);        Vector  userRec  =  merge(puRec);      return  userRec;   }     Vector  merge(@Collec>on  Vector[]  v){      /*…*/   }     Partial and Collection Annotations >  @Collec>on  annota:on  indicates  merge  logic  

Slide 39

Slide 39 text

38   Outline >  Failures   •  Mo:va:on   •  SDG:  Stateful  Dataflow  Graphs   •  Handling  distributed  state  in  SDGs   •  Transla:ng  Java  programs  to  SDGs   •  Checkpoint-­‐Based  fault  tolerance  for  SDGs   •  Experimental  evalua:on   Program.java  

Slide 40

Slide 40 text

39   Challenges of Making SDGs Fault Tolerant Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state  

Slide 41

Slide 41 text

40   Challenges of Making SDGs Fault Tolerant RAM   RAM   Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes  

Slide 42

Slide 42 text

41   RAM   RAM   Physical  deployment  of  SDG   >  Node  failures  may     lead  to  state  loss   Checkpoin>ng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin:ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant

Slide 43

Slide 43 text

42   RAM   RAM   Physical  deployment  of  SDG   •  Backups  large  and  cannot  be   stored  in  memory   •  Large  writes  to  disk  through   network  have  high  cost   State  Backup   >  Node  failures  may     lead  to  state  loss   Checkpoin>ng  State   •  No  updates  allowed  while  state   is  being  checkpointed   •  Checkpoin:ng  state  should  not   impact  data  processing  path   >  Task  elements  access   local  in-­‐memory  state   Physical   nodes   Challenges of Making SDGs Fault Tolerant

Slide 44

Slide 44 text

43   Checkpoint Mechanism for Fault Tolerance 1.  Freeze  mutable  state  for  checkpoin:ng   2.  Dirty  state  supports  updates  concurrently   3.  Reconcile  dirty  state   Asynchronous,  lock-­‐free  checkpoin>ng     Dirty  state  

Slide 45

Slide 45 text

44   Distributed M to N Checkpoint Backup M  to  N  distributed  backup  and   parallel  recovery  

Slide 46

Slide 46 text

45   Distributed M to N Checkpoint Backup M  to  N  distributed  backup  and   parallel  recovery  

Slide 47

Slide 47 text

46   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 48

Slide 48 text

47   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 49

Slide 49 text

48   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 50

Slide 50 text

49   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 51

Slide 51 text

50   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 52

Slide 52 text

51   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 53

Slide 53 text

52   M  to  N  distributed  backup  and   parallel  recovery   Distributed M to N Checkpoint Backup

Slide 54

Slide 54 text

How  does  mutable  state  impact  performance?   How  efficient  are  translated  SDGs?   What  is  the  throughput/latency  trade-­‐off?   Experimental  set-­‐up:   –  Amazon  EC2  (c1  and  m1  xlarge  instances)   –  Private  cluster  (4-­‐core  3.4  GHz  Intel  Xeon  servers  with  8  GB  RAM  )   –  Sun  Java  7,  Ubuntu  12.04,  Linux  kernel  3.10   53   Evaluation of SDG Performance

Slide 55

Slide 55 text

54   0 5 10 15 20 1:5 1:2 1:1 2:1 5:1 100 1000 Throughput (1000 requests/s) Latency (ms) Workload (state read/write ratio) Throughput Latency Combines  batch  and  online  processing  to  serve  fresh   results  over  large  mutable  state   Processing with Large Mutable State >  addRa:ng  and  getRec  func:ons  from  recommender   algorithm,  while  changing  read/write  ra:o  

Slide 56

Slide 56 text

55   0 10 20 30 40 50 60 25 50 75 100 Throughput (GB/s) Number of nodes SDG Spark Translated  SDG  achieves  performance     similar  to  non-­‐mutable  dataflow   >  Batch-­‐oriented,  itera:ve  logis:c  regression   E#ciency of Translated SDG

Slide 57

Slide 57 text

56   SDGs  achieve  high  throughput  while  main>ng  low  latency   Latency/Throughput Tradeo$ >  Streaming  word  count  query,  repor:ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency

Slide 58

Slide 58 text

57   SDGs  achieve  high  throughput  while  main>ng  low  latency   Latency/Throughput Tradeo$ >  Streaming  word  count  query,  repor:ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark

Slide 59

Slide 59 text

58   SDGs  achieve  high  throughput  while  main>ng  low  latency   Latency/Throughput Tradeo$ >  Streaming  word  count  query,  repor:ng   counts  over  windows   0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark Naiad-LowLatency

Slide 60

Slide 60 text

Running  Java  programs  with  the  performance  of   current  distributed  dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac:ons  for  distributed  mutable  state   – Annota>ons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   59   Summary

Slide 61

Slide 61 text

Running  Java  programs  with  the  performance  of   current  distributed  dataflow  frameworks   SDG:  Stateful  Dataflow  Graphs   – Abstrac:ons  for  distributed  mutable  state   – Annota>ons  to  disambiguate  types  of   distributed  state  and  state  access   – Checkpoint-­‐based  fault  tolerance  mechanism   60   Summary Thank  you!   Any  Ques>ons?   @raulcfernandez   [email protected]   hEps://github.com/lsds/Seep/   hEps://github.com/raulcf/SEEPng/  

Slide 62

Slide 62 text

BACKUP  SLIDES   61  

Slide 63

Slide 63 text

62   0 0.5 1 1.5 2 50 100 150 200 1 10 100 1000 Throughput (million requests/s) Latency (ms) Aggregated memory (GB) Throughput Latency Support  large  state  without  compromising  throughput   or  latency  while  staying  fault  tolerant   Scalability  on  State  Size  and  Throughput   >  Increase  state  size  in  a  mutated  KV  store  

Slide 64

Slide 64 text

63   Itera:on  in  SDG   >  Local  itera>on  supported  by  one  node   >  Itera>on  across  TEs  requires  cycle  in  the  dataflow  

Slide 65

Slide 65 text

•  Par::on   •  Par:al   •  Global   •  Par:al   •  Collec:on   •  Data  annota:ons   – Batch   – Stream   64   Types  of  Annota:ons  

Slide 66

Slide 66 text

Overhead  of  SDG  Fault  Tolerance   65   1 10 100 1000 10000 No FT 1 2 3 4 5 Latency (ms) State size (GB) 1 10 100 1000 2 4 6 8 10 No FT Latency (ms) Checkpoint frequency (s) Fault  Tolerance  mechanism   impact  on  performance  and   latency  is  small.   State  size  and  checkpoin>ng   Frequency  do  not  affect  the   performance  

Slide 67

Slide 67 text

66   0 2 4 6 8 10 10 100 1000 2000 0 20 40 60 80 100 Throughput (10,000 requests/s) Latency (ms) Aggregated memory (MB) SDG Naiad-NoDisk Naiad-Disk SDG (latency) Naiad-NoDisk (latency) Fault  Tolerance  Overhead  

Slide 68

Slide 68 text

0 5 10 15 20 25 30 35 40 1 2 4 Recovery time (s) State size (GB) 1-to-1 recovery 2-to-1 recovery 1-to-2 recovery 2-to-2 recovery 67   Recovery  Times  

Slide 69

Slide 69 text

68   0 5 10 15 20 25 30 0 10 20 30 40 50 60 0 1 2 3 4 5 Throughput (1000 request/s) Number of nodes Time (s) Throughput Nodes Stragglers  

Slide 70

Slide 70 text

69   0 50 100 150 200 250 1 2 3 4 0.001 0.01 0.1 1 10 Throughput (1000 requests/s) Latency (s) State size (GB) T'put (Sync) Latency (Sync) T'put (Async) Fault  Tolerance  Sync.  Vs.  Async.  

Slide 71

Slide 71 text

System   Large  State   Mutable  State   Low  Latency   Itera>on   MapReduce   n/a   n/a   No   No   Spark   n/a   n/a   No   Yes   Storm   n/a   n/a   Yes   No   Naiad   No   Yes   Yes   Yes   SDG   Yes   Yes   Yes   Yes   70   Comparison  to  State-­‐of-­‐the-­‐Art   SDGs  are  first  stateful  fault  tolerant  model;  enabling   execu:on  of  impera:ve  code  with  explicit  state  

Slide 72

Slide 72 text

71   Characteris:cs  of  SDGs   >  Run>me  Data  Parallelism   (elas>city)   >  Support  for  Cyclic  Graphs   >  Low  Latency   Adapta:on  to  varying  workloads     and  mechanism  against  stragglers   Efficiently  represent  itera:ve     algorithms   Pipelining  tasks  decreases     latency  

Slide 73

Slide 73 text

72   Bob   Local  Expert   Hi,  I  have  a  query  to  run  on  “Big  Data”   Ok,  cool,  tell  me  about  it   I  want  to  know  sales  per  employee  on  Saturdays   …  well  …  ok,  come  in  3  days   Well,  this  is  actually  preWy  urgent…   …  2  days,  I’m  preWy  busy   2  Days  Ayer   Hi!  You  have  the  results?   Yes,  here  you  have  your  sales  last  Saturday   My  sales?  I  meant  all  employee  sales,  and  not  only  last  Saturday   ups,  sorry  for  that,  give  me  2  days…  

Slide 74

Slide 74 text

17TH ~ 18th NOV 2014 MADRID (SPAIN)