Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Declarative, Sliding Window Computations at the...

Declarative, Sliding Window Computations at the Edge

EdgeCom 2016

Christopher Meiklejohn

January 09, 2016
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Programming

Transcript

  1. Declarative Sliding-Window Aggregations For Computations at the Edge Christopher Meiklejohn,

    Machine Zone, Inc. Peter Van Roy, Seyed H. Haeri (Hossein), Université catholique de Louvain EdgeCom 2016, January 9th, 2016 1
  2. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard 3
  3. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity 3
  4. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state 3
  5. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state • How should we manage events generated at the device? 3
  6. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna 4
  7. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results 4
  8. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results • Least general, however efficient
 Algorithm can be designed specifically to address unordered delivery, and optimized for minimal state transmission 4
  9. Can we design a general programming model for efficient distributed

    computation that can tolerate message delays, reordering, and duplication? 5
  10. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) 6
  11. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set 6
  12. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set • Extend our model with dynamic scope
 “Dynamic” variables, where each node contains a unique value for a given variable which can be aggregated with a “dynamic” fold operation 6
  13. Conflict-Free 
 Replicated Data Types • Collection of types
 Sets,

    counters, registers, flags, maps • Strong Eventual Consistency (SEC)
 Objects that receive the same updates, regardless of order, will reach equivalent state 8
  14. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1)
  15. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1) {1} {1} {1} (1, {a, c}, {c}) (1, {a, c}, {c}) (1, {a, c}, {c})
  16. Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow

    programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT 15
  17. Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow

    programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT • Enables composition
 Provides functional composition of CRDTs that preserves the SEC property 15
  18. 16 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  19. 17 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  20. 18 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  21. 19 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  22. 20 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  23. Lattice Processing (Lasp) • Functional and set-theoretic operations on sets


    Product, intersection, union, filter, map, fold 21
  24. Lattice Processing (Lasp) • Functional and set-theoretic operations on sets


    Product, intersection, union, filter, map, fold • Metadata computation
 Performs transformation on the internal metadata of CRDTs allowing creation of “composed” CRDTs 21
  25. Computing Aggregates • Sensors generate events
 Fold a rolling set

    of events into a local device average • Merge local averages per device
 Fold local averages across devices into a global average replicated at each device 23
  26. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 24
  27. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 25
  28. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 26
  29. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 27
  30. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 28
  31. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 29
  32. 30 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage) %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  33. 31 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  34. 32 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  35. 33 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  36. 34 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  37. 35 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  38. 36 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  39. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements 38
  40. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time 38
  41. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges 38
  42. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges • “Last-Writer-Wins” with Single Writer
 Single writer removes possible non-determinism inherent with LWW- registers with replicated state 38
  43. Samples (bound 2) Event Handler {1} {(1, T1, F)} 1

    {1, 2} 2 {(1, T1, F), (2, T2, F)} 41
  44. Samples (bound 2) Event Handler {1} {(1, T1, F)} 1

    {1, 2} 2 {(1, T1, F), (2, T2, F)} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} 3 {2, 3} 42
  45. Lasp Fold • Fold
 Computes an aggregate over a set

    into another type of CRDT • Function invariants
 Operation must be associative, commutative, and have an inverse operation 43
  46. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} 45
  47. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} 46
  48. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} {2, 3} {(T1, 1}, {T2, 2}, (T3, 3)}, {(T1, 1}} {(T1, 1}, {T2, 1}, (T3, 1)}, {(T1, 1)} {5, 2} {5, 2} 47
  49. Dynamic Scope • Variables exist on all nodes
 Variables exist

    across all nodes with a given identifier with their own value that is not replicated 48
  50. Dynamic Scope • Variables exist on all nodes
 Variables exist

    across all nodes with a given identifier with their own value that is not replicated • Dynamic Fold operation
 Combines the value using a merge across all nodes through pairwise synchronization until fixed point is reached 48
  51. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 49
  52. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 50
  53. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 51
  54. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 52
  55. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies 54
  56. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model 54
  57. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model • PVARS
 Similar to *Lisp parallel variables, where each processor had it’s own value and could apply an operation across nodes 54
  58. Future Work • Quantitative evaluation
 Evaluation and optimization of the

    prototype implementation in Erlang • Optimizations to reduce metadata
 Apply known CRDT optimizations to both the fold operation and data structures to reduce space complexity 55