Dataflows: The abstraction that powers the Big Data technology by RAÚL CASTRO FERNÁNDEZ at Big Data Spain 2014

by Big Data Spain

Slide 1

Slide 1 text

THE ABSTRACTION THAT POWERS THE BIG DATA RAÚL CASTRO FERNÁNDEZ COMPUTER SCIENCE PHD STUDENT IMPERIAL COLLEGE

Slide 2

Slide 2 text

Data!ows: The Abstraction that Powers Big Data Raul Castro Fernandez Imperial College London [email protected] @raulcfernandez

Slide 3

Slide 3 text

“Big Data needs Democra:za:on”

Slide 4

Slide 4 text

3 Developers and DBAs are no longer the only ones genera:ng, processing and analyzing data. Democratization of Data

Slide 5

Slide 5 text

4 Decision makers, domain scien:sts, applica:on users, journalists, crowd workers, and everyday consumers, sales, marke:ng… Democratization of Data Developers and DBAs are no longer the only ones genera:ng, processing and analyzing data.

Slide 6

Slide 6 text

5 + Everyone has data

Slide 7

Slide 7 text

6 + Everyone has data + Many have interes:ng ques:ons

Slide 8

Slide 8 text

7 + Everyone has data + Many have interes:ng ques:ons -‐ Not everyone knows how to analyze it

Slide 9

Slide 9 text

8 + Everyone has data + Many have interes:ng ques:ons -‐ Not everyone knows how to analyze it

Slide 10

Slide 10 text

9 Bob Local Expert

Slide 11

Slide 11 text

10 Bob Local Expert

Slide 12

Slide 12 text

11 Bob Local Expert -‐ Barrier of human communica:on -‐ Barrier of professional rela:ons

Slide 13

Slide 13 text

12 Bob Local Expert -‐ Barrier of human communica:on -‐ Barrier of professional rela:ons The limits of my language mean the limits of my world. Ludwig WiWgenstein “Tractatus Logico-‐Philosophicus 1922”

Slide 14

Slide 14 text

13 First step to democra:ze Big Data: to oﬀer a familiar programming interface

Slide 15

Slide 15 text

•  Mo>va>on •  SDG: Stateful Dataﬂow Graphs •  Handling distributed state in SDGs •  Transla:ng Java programs to SDGs •  Checkpoint-‐based fault tolerance for SDGs •  Experimental evalua:on 14 Outline ? ?

Slide 16

Slide 16 text

Mutable State in a Recommender System 15 Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); Item-‐A Item-‐B User-‐A 4 5 User-‐B 0 5 Item-‐A Item-‐B Item-‐A 1 1 Item-‐B 1 2 User-‐Item matrix (UI) Co-‐Occurrence matrix (CO)

Slide 17

Slide 17 text

Mutable State in a Recommender System 16 Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); void addRa>ng(int user, int item, int ra>ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); } Item-‐A Item-‐B User-‐A 4 5 User-‐B 0 5 Item-‐A Item-‐B Item-‐A 1 1 Item-‐B 1 2 User-‐Item matrix (UI) Co-‐Occurrence matrix (CO) Update with new ra:ngs

Slide 18

Slide 18 text

Mutable State in a Recommender System 17 Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); void addRa>ng(int user, int item, int ra>ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); } Vector getRec(int user) { Vector userRow = userItem.getRow(user); Vector userRec = coOcc.mul:ply(userRow); return userRec; } Item-‐A Item-‐B User-‐A 4 5 User-‐B 0 5 Item-‐A Item-‐B Item-‐A 1 1 Item-‐B 1 2 User-‐Item matrix (UI) Co-‐Occurrence matrix (CO) Update with new ra:ngs Mul:ply for recommenda:on User-‐B 1 2 x

Slide 19

Slide 19 text

18 Challenges When Executing with Big Data Big Data Problem: Matrices become large > Mutable state leads to concise algorithms but complicates parallelism and fault tolerance Matrix userItem = new Matrix(); Matrix coOcc = new Matrix(); > Cannot lose state aRer failure > Need to manage state to support data-‐parallelism

Slide 20

Slide 20 text

19 Using Current Distributed Data"ow Frameworks Input data Output data > No mutable state simpliﬁes fault tolerance > MapReduce: Map and Reduce tasks > Storm: No support for state > Spark: Immutable RDDs

Slide 21

Slide 21 text

20 > Programming distributed dataﬂow graphs requires learning new programming models Imperative Big Data Processing

Slide 22

Slide 22 text

21 Our Goal: Run Java programs with mutable state but with performance and fault tolerance of distributed dataﬂow systems > Programming distributed dataﬂow graphs requires learning new programming models Imperative Big Data Processing

Slide 23

Slide 23 text

22 > @Annota>ons help with transla>on from Java to SDGs > Mutable distributed state in dataﬂow graphs Stateful Data"ow Graphs: From Imperative Programs to Distributed Data"ows Program.java SDGs: Stateful Dataﬂow Graphs > Checkpoint-‐based fault tolerance recovers mutable state aRer failure

Slide 24

Slide 24 text

•  Mo:va:on •  SDG: Stateful Dataﬂow Graphs •  Handling distributed state in SDGs •  Transla:ng Java programs to SDGs •  Checkpoint-‐based fault tolerance for SDGs •  Experimental evalua:on 23 Outline Program.java

Slide 25

Slide 25 text

SDG: Data, State and Computation > SDGs separate data and state to allow data and pipeline parallelism 24 Task Elements (TEs) process data State Elements (SEs) represent state Dataﬂows represent data > Task Elements have local access to State Elements

Slide 26

Slide 26 text

State Elements support two abstrac:ons for distributed mutable state –  Par>>oned SEs: task elements always access state by key –  Par>al SEs: task elements can access complete state 25 Distributed Mutable State

Slide 27

Slide 27 text

26 Distributed Mutable State: Partitioned SEs Dataﬂow routed according to hash func:on Item-‐A Item-‐B User-‐A 4 5 User-‐B 0 5 Access by key State par::oned according to par>>oning key > Par>>oned SEs split into disjoint par::ons User-‐Item matrix (UI) hash(msg.id) Key space: [0-‐N] [0-‐k] [(k+1)-‐N]

Slide 28

Slide 28 text

27 Distributed Mutable State: Partial SEs Local access: Data sent to one Global access: Data sent to all > Par>al SE gives nodes local state instances > Par>al SE access by TEs can be local or global

Slide 29

Slide 29 text

28 Merging Distributed Mutable State Merge logic > Requires applica:on-‐speciﬁc merge logic > Reading all par:al SE instances results in set of par>al values

Slide 30

Slide 30 text

29 Merging Distributed Mutable State Mul:ple par:al values Merge logic > Requires applica:on-‐speciﬁc merge logic > Reading all par:al SE instances results in set of par>al values

Slide 31

Slide 31 text

30 Merging Distributed Mutable State Mul:ple par:al values Collect par:al values Merge logic > Requires applica:on-‐speciﬁc merge logic > Reading all par:al SE instances results in set of par>al values

Slide 32

Slide 32 text

31 Outline > @Annota>ons •  Mo:va:on •  SDG: Stateful Dataﬂow Graphs •  Handling distributed state in SDGs •  Transla>ng Java programs to SDGs •  Checkpoint-‐based fault tolerance for SDGs •  Experimental evalua:on Program.java

Slide 33

Slide 33 text

32 From Imperative Code to Execution SEEP Annotated program > SEEP: data-‐parallel processing plaborm •  Transla:on occurs in two stages: –  Sta

Slide 34

Slide 34 text

Program.java 33 Extract TEs, SEs and accesses Live variable analysis TE and SE access code assembly SEEP runnable SOOT Framework Javassist > Extract state and state access paderns through sta:c code analysis > Genera:on of runnable code using TE and SE connec:ons Translation Process

Slide 35

Slide 35 text

Program.java 34 Extract TEs, SEs and accesses Live variable analysis TE and SE access code assembly SEEP runnable SOOT Framework Javassist > Extract state and state access paderns through sta:c code analysis > Genera:on of runnable code using TE and SE connec:ons Translation Process Annotated Program.java

Slide 36

Slide 36 text

35 @Par>>oned Matrix userItem = new SeepMatrix(); Matrix coOcc = new Matrix(); void addRa:ng(int user, int item, int ra:ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(coOcc, userItem); } Vector getRec(int user) { Vector userRow = userItem.getRow(user); Vector userRec = coOcc.mul:ply(userRow); return userRec; } Partitioned State Annotation > @Par>>on ﬁeld annota>on indicates par<

Slide 37

Slide 37 text

36 @Par::oned Matrix userItem = new SeepMatrix(); @Par>al Matrix coOcc = new SeepMatrix(); void addRa:ng(int user, int item, int ra:ng) { userItem.setElement(user, item, ra:ng); updateCoOccurrence(@Global coOcc, userItem); } Partial State and Global Annotations > @Global annotates variable to indicate access to all par:al instances > @Par>al ﬁeld annota>on indicates par

Slide 38

Slide 38 text

37 @Par::oned Matrix userItem = new SeepMatrix(); @Par>al Matrix coOcc = new SeepMatrix(); Vector getRec(int user) { Vector userRow = userItem.getRow(user); @Par>al Vector puRec = @Global coOcc.mul:ply(userRow); Vector userRec = merge(puRec); return userRec; } Vector merge(@Collec>on Vector[] v){ /*…*/ } Partial and Collection Annotations > @Collec>on annota:on indicates merge logic

Slide 39

Slide 39 text

38 Outline > Failures •  Mo:va:on •  SDG: Stateful Dataﬂow Graphs •  Handling distributed state in SDGs •  Transla:ng Java programs to SDGs •  Checkpoint-‐Based fault tolerance for SDGs •  Experimental evalua:on Program.java

Slide 40

Slide 40 text

39 Challenges of Making SDGs Fault Tolerant Physical deployment of SDG > Node failures may lead to state loss > Task elements access local in-‐memory state

Slide 41

Slide 41 text

40 Challenges of Making SDGs Fault Tolerant RAM RAM Physical deployment of SDG > Node failures may lead to state loss > Task elements access local in-‐memory state Physical nodes

Slide 42

Slide 42 text

41 RAM RAM Physical deployment of SDG > Node failures may lead to state loss Checkpoin>ng State •  No updates allowed while state is being checkpointed •  Checkpoin:ng state should not impact data processing path > Task elements access local in-‐memory state Physical nodes Challenges of Making SDGs Fault Tolerant

Slide 43

Slide 43 text

42 RAM RAM Physical deployment of SDG •  Backups large and cannot be stored in memory •  Large writes to disk through network have high cost State Backup > Node failures may lead to state loss Checkpoin>ng State •  No updates allowed while state is being checkpointed •  Checkpoin:ng state should not impact data processing path > Task elements access local in-‐memory state Physical nodes Challenges of Making SDGs Fault Tolerant

Slide 44

Slide 44 text

43 Checkpoint Mechanism for Fault Tolerance 1.  Freeze mutable state for checkpoin:ng 2.  Dirty state supports updates concurrently 3.  Reconcile dirty state Asynchronous, lock-‐free checkpoin>ng Dirty state

Slide 45

Slide 45 text

44 Distributed M to N Checkpoint Backup M to N distributed backup and parallel recovery

Slide 46

Slide 46 text

45 Distributed M to N Checkpoint Backup M to N distributed backup and parallel recovery

Slide 47

Slide 47 text

46 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 48

Slide 48 text

47 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 49

Slide 49 text

48 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 50

Slide 50 text

49 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 51

Slide 51 text

50 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 52

Slide 52 text

51 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 53

Slide 53 text

52 M to N distributed backup and parallel recovery Distributed M to N Checkpoint Backup

Slide 54

Slide 54 text

How does mutable state impact performance? How eﬃcient are translated SDGs? What is the throughput/latency trade-‐oﬀ? Experimental set-‐up: –  Amazon EC2 (c1 and m1 xlarge instances) –  Private cluster (4-‐core 3.4 GHz Intel Xeon servers with 8 GB RAM ) –  Sun Java 7, Ubuntu 12.04, Linux kernel 3.10 53 Evaluation of SDG Performance

Slide 55

Slide 55 text

54 0 5 10 15 20 1:5 1:2 1:1 2:1 5:1 100 1000 Throughput (1000 requests/s) Latency (ms) Workload (state read/write ratio) Throughput Latency Combines batch and online processing to serve fresh results over large mutable state Processing with Large Mutable State > addRa:ng and getRec func:ons from recommender algorithm, while changing read/write ra:o

Slide 56

Slide 56 text

55 0 10 20 30 40 50 60 25 50 75 100 Throughput (GB/s) Number of nodes SDG Spark Translated SDG achieves performance similar to non-‐mutable dataﬂow > Batch-‐oriented, itera:ve logis:c regression E#ciency of Translated SDG

Slide 57

Slide 57 text

56 SDGs achieve high throughput while main>ng low latency Latency/Throughput Tradeo$ > Streaming word count query, repor:ng counts over windows 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency

Slide 58

Slide 58 text

57 SDGs achieve high throughput while main>ng low latency Latency/Throughput Tradeo$ > Streaming word count query, repor:ng counts over windows 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark

Slide 59

Slide 59 text

58 SDGs achieve high throughput while main>ng low latency Latency/Throughput Tradeo$ > Streaming word count query, repor:ng counts over windows 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) SDG Naiad-LowLatency 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark 0 50 100 150 200 250 10 100 1000 10000 Throughput (1000 requests/s) Window size (ms) Naiad-HighThroughput SDG Streaming Spark Naiad-LowLatency

Slide 60

Slide 60 text

Running Java programs with the performance of current distributed dataﬂow frameworks SDG: Stateful Dataﬂow Graphs – Abstrac:ons for distributed mutable state – Annota>ons to disambiguate types of distributed state and state access – Checkpoint-‐based fault tolerance mechanism 59 Summary

Slide 61

Slide 61 text

Slide 62

Slide 62 text

BACKUP SLIDES 61

Slide 63

Slide 63 text

62 0 0.5 1 1.5 2 50 100 150 200 1 10 100 1000 Throughput (million requests/s) Latency (ms) Aggregated memory (GB) Throughput Latency Support large state without compromising throughput or latency while staying fault tolerant Scalability on State Size and Throughput > Increase state size in a mutated KV store

Slide 64

Slide 64 text

63 Itera:on in SDG > Local itera>on supported by one node > Itera>on across TEs requires cycle in the dataﬂow

Slide 65

Slide 65 text

•  Par::on •  Par:al •  Global •  Par:al •  Collec:on •  Data annota:ons – Batch – Stream 64 Types of Annota:ons

Slide 66

Slide 66 text

Overhead of SDG Fault Tolerance 65 1 10 100 1000 10000 No FT 1 2 3 4 5 Latency (ms) State size (GB) 1 10 100 1000 2 4 6 8 10 No FT Latency (ms) Checkpoint frequency (s) Fault Tolerance mechanism impact on performance and latency is small. State size and checkpoin>ng Frequency do not aﬀect the performance

Slide 67

Slide 67 text

66 0 2 4 6 8 10 10 100 1000 2000 0 20 40 60 80 100 Throughput (10,000 requests/s) Latency (ms) Aggregated memory (MB) SDG Naiad-NoDisk Naiad-Disk SDG (latency) Naiad-NoDisk (latency) Fault Tolerance Overhead

Slide 68

Slide 68 text

0 5 10 15 20 25 30 35 40 1 2 4 Recovery time (s) State size (GB) 1-to-1 recovery 2-to-1 recovery 1-to-2 recovery 2-to-2 recovery 67 Recovery Times

Slide 69

Slide 69 text

68 0 5 10 15 20 25 30 0 10 20 30 40 50 60 0 1 2 3 4 5 Throughput (1000 request/s) Number of nodes Time (s) Throughput Nodes Stragglers

Slide 70

Slide 70 text

69 0 50 100 150 200 250 1 2 3 4 0.001 0.01 0.1 1 10 Throughput (1000 requests/s) Latency (s) State size (GB) T'put (Sync) Latency (Sync) T'put (Async) Fault Tolerance Sync. Vs. Async.

Slide 71

Slide 71 text

System Large State Mutable State Low Latency Itera>on MapReduce n/a n/a No No Spark n/a n/a No Yes Storm n/a n/a Yes No Naiad No Yes Yes Yes SDG Yes Yes Yes Yes 70 Comparison to State-‐of-‐the-‐Art SDGs are ﬁrst stateful fault tolerant model; enabling execu:on of impera:ve code with explicit state

Slide 72

Slide 72 text

71 Characteris:cs of SDGs > Run>me Data Parallelism (elas>city) > Support for Cyclic Graphs > Low Latency Adapta:on to varying workloads and mechanism against stragglers Eﬃciently represent itera:ve algorithms Pipelining tasks decreases latency

Slide 73

Slide 73 text

72 Bob Local Expert Hi, I have a query to run on “Big Data” Ok, cool, tell me about it I want to know sales per employee on Saturdays … well … ok, come in 3 days Well, this is actually preWy urgent… … 2 days, I’m preWy busy 2 Days Ayer Hi! You have the results? Yes, here you have your sales last Saturday My sales? I meant all employee sales, and not only last Saturday ups, sorry for that, give me 2 days…

Slide 74

Slide 74 text

17TH ~ 18th NOV 2014 MADRID (SPAIN)