Slide 1

Slide 1 text

Declarative Sliding-Window Aggregations For Computations at the Edge Christopher Meiklejohn, Machine Zone, Inc. Peter Van Roy, Seyed H. Haeri (Hossein), Université catholique de Louvain EdgeCom 2016, January 9th, 2016 1

Slide 2

Slide 2 text

What is Edge Computation? 2

Slide 3

Slide 3 text

Edge Computation • Logical extremes of the network
 Applications, data, and computation 3

Slide 4

Slide 4 text

Edge Computation • Logical extremes of the network
 Applications, data, and computation • Especially challenging where synchronization is hard 3

Slide 5

Slide 5 text

Edge Computation • Logical extremes of the network
 Applications, data, and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity 3

Slide 6

Slide 6 text

Edge Computation • Logical extremes of the network
 Applications, data, and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state 3

Slide 7

Slide 7 text

Edge Computation • Logical extremes of the network
 Applications, data, and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state • How should we manage events generated at the device? 3

Slide 8

Slide 8 text

Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all events to a centralized location for processing 4

Slide 9

Slide 9 text

Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna 4

Slide 10

Slide 10 text

Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results 4

Slide 11

Slide 11 text

Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results • Least general, however efficient
 Algorithm can be designed specifically to address unordered delivery, and optimized for minimal state transmission 4

Slide 12

Slide 12 text

Can we design a general programming model for efficient distributed computation that can tolerate message delays, reordering, and duplication? 5

Slide 13

Slide 13 text

Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative, functional programming model over distributed data structures (CRDTs) 6

Slide 14

Slide 14 text

Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative, functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set 6

Slide 15

Slide 15 text

Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative, functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set • Extend our model with dynamic scope
 “Dynamic” variables, where each node contains a unique value for a given variable which can be aggregated with a “dynamic” fold operation 6

Slide 16

Slide 16 text

Background Conflict-Free 
 Replicated Data Types 7 SSS 2011

Slide 17

Slide 17 text

Conflict-Free 
 Replicated Data Types • Collection of types
 Sets, counters, registers, flags, maps 8

Slide 18

Slide 18 text

Conflict-Free 
 Replicated Data Types • Collection of types
 Sets, counters, registers, flags, maps • Strong Eventual Consistency (SEC)
 Objects that receive the same updates, regardless of order, will reach equivalent state 8

Slide 19

Slide 19 text

RA RB RC

Slide 20

Slide 20 text

RA RB RC {1} (1, {a}, {}) add(1)

Slide 21

Slide 21 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {c}, {}) add(1)

Slide 22

Slide 22 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {c}, {}) add(1) {} (1, {c}, {c}) remove(1)

Slide 23

Slide 23 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {c}, {}) add(1) {} (1, {c}, {c}) remove(1) {1} {1} {1} (1, {a, c}, {c}) (1, {a, c}, {c}) (1, {a, c}, {c})

Slide 24

Slide 24 text

Background Lattice Processing 14

Slide 25

Slide 25 text

Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for “eventually consistent” computations 15

Slide 26

Slide 26 text

Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT 15

Slide 27

Slide 27 text

Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT • Enables composition
 Provides functional composition of CRDTs that preserves the SEC property 15

Slide 28

Slide 28 text

16 %% Create initial set. S1 = declare(set), %% Add elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Slide 29

Slide 29 text

17 %% Create initial set. S1 = declare(set), %% Add elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Slide 30

Slide 30 text

18 %% Create initial set. S1 = declare(set), %% Add elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Slide 31

Slide 31 text

19 %% Create initial set. S1 = declare(set), %% Add elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Slide 32

Slide 32 text

20 %% Create initial set. S1 = declare(set), %% Add elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).

Slide 33

Slide 33 text

Lattice Processing (Lasp) • Functional and set-theoretic operations on sets
 Product, intersection, union, filter, map, fold 21

Slide 34

Slide 34 text

Lattice Processing (Lasp) • Functional and set-theoretic operations on sets
 Product, intersection, union, filter, map, fold • Metadata computation
 Performs transformation on the internal metadata of CRDTs allowing creation of “composed” CRDTs 21

Slide 35

Slide 35 text

Example Application Computing Averages 22

Slide 36

Slide 36 text

Computing Aggregates • Sensors generate events
 Fold a rolling set of events into a local device average 23

Slide 37

Slide 37 text

Computing Aggregates • Sensors generate events
 Fold a rolling set of events into a local device average • Merge local averages per device
 Fold local averages across devices into a global average replicated at each device 23

Slide 38

Slide 38 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 24

Slide 39

Slide 39 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 25

Slide 40

Slide 40 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 26

Slide 41

Slide 41 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 27

Slide 42

Slide 42 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 28

Slide 43

Slide 43 text

Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 29

Slide 44

Slide 44 text

30 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage) %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 45

Slide 45 text

31 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 46

Slide 46 text

32 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 47

Slide 47 text

33 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 48

Slide 48 text

34 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 49

Slide 49 text

35 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 50

Slide 50 text

36 %% Define a pair of counters to store the global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)

Slide 51

Slide 51 text

Lasp Extensions Semantics 37

Slide 52

Slide 52 text

Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as Observed-Remove Set, but enforces a maximum number of elements 38

Slide 53

Slide 53 text

Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time 38

Slide 54

Slide 54 text

Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges 38

Slide 55

Slide 55 text

Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges • “Last-Writer-Wins” with Single Writer
 Single writer removes possible non-determinism inherent with LWW- registers with replicated state 38

Slide 56

Slide 56 text

Samples (bound 2) Event Handler 39

Slide 57

Slide 57 text

Samples (bound 2) Event Handler {1} {(1, T1, F)} 1 40

Slide 58

Slide 58 text

Samples (bound 2) Event Handler {1} {(1, T1, F)} 1 {1, 2} 2 {(1, T1, F), (2, T2, F)} 41

Slide 59

Slide 59 text

Samples (bound 2) Event Handler {1} {(1, T1, F)} 1 {1, 2} 2 {(1, T1, F), (2, T2, F)} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} 3 {2, 3} 42

Slide 60

Slide 60 text

Lasp Fold • Fold
 Computes an aggregate over a set into another type of CRDT 43

Slide 61

Slide 61 text

Lasp Fold • Fold
 Computes an aggregate over a set into another type of CRDT • Function invariants
 Operation must be associative, commutative, and have an inverse operation 43

Slide 62

Slide 62 text

Samples (bound 2) Local Avg 44

Slide 63

Slide 63 text

Samples (bound 2) Local Avg {1} {(1, T1, F)} {1, 1} {(T1, 1)}, {} {(T1, 1)}, {} 45

Slide 64

Slide 64 text

Samples (bound 2) Local Avg {1} {(1, T1, F)} {1, 1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} 46

Slide 65

Slide 65 text

Samples (bound 2) Local Avg {1} {(1, T1, F)} {1, 1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} {2, 3} {(T1, 1}, {T2, 2}, (T3, 3)}, {(T1, 1}} {(T1, 1}, {T2, 1}, (T3, 1)}, {(T1, 1)} {5, 2} {5, 2} 47

Slide 66

Slide 66 text

Dynamic Scope • Variables exist on all nodes
 Variables exist across all nodes with a given identifier with their own value that is not replicated 48

Slide 67

Slide 67 text

Dynamic Scope • Variables exist on all nodes
 Variables exist across all nodes with a given identifier with their own value that is not replicated • Dynamic Fold operation
 Combines the value using a merge across all nodes through pairwise synchronization until fixed point is reached 48

Slide 68

Slide 68 text

Global Avg Local Avg Fold 1.) “bind” message received 2.) fold operation re-executed with new value 3.) “bind” broadcast to peers 49

Slide 69

Slide 69 text

Global Avg Local Avg Fold 1.) “bind” message received 2.) fold operation re-executed with new value 3.) “bind” broadcast to peers 50

Slide 70

Slide 70 text

Global Avg Local Avg Fold 1.) “bind” message received 2.) fold operation re-executed with new value 3.) “bind” broadcast to peers 51

Slide 71

Slide 71 text

Global Avg Local Avg Fold 1.) “bind” message received 2.) fold operation re-executed with new value 3.) “bind” broadcast to peers 52

Slide 72

Slide 72 text

In Summary 53

Slide 73

Slide 73 text

Related Work • Directed / Digest Diffusion
 Energy efficient aggregation and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies 54

Slide 74

Slide 74 text

Related Work • Directed / Digest Diffusion
 Energy efficient aggregation and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model 54

Slide 75

Slide 75 text

Related Work • Directed / Digest Diffusion
 Energy efficient aggregation and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model • PVARS
 Similar to *Lisp parallel variables, where each processor had it’s own value and could apply an operation across nodes 54

Slide 76

Slide 76 text

Future Work 55

Slide 77

Slide 77 text

Future Work • Quantitative evaluation
 Evaluation and optimization of the prototype implementation in Erlang 55

Slide 78

Slide 78 text

Future Work • Quantitative evaluation
 Evaluation and optimization of the prototype implementation in Erlang • Optimizations to reduce metadata
 Apply known CRDT optimizations to both the fold operation and data structures to reduce space complexity 55

Slide 79

Slide 79 text

Thanks! 56 Christopher Meiklejohn @cmeik