Declarative, Sliding Window Computations at the Edge

Declarative, Sliding Window Computations at the Edge

EdgeCom 2016

3e09fee7b359be847ed5fa48f524a3d3?s=128

Christopher Meiklejohn

January 09, 2016
Tweet

Transcript

  1. Declarative Sliding-Window Aggregations For Computations at the Edge Christopher Meiklejohn,

    Machine Zone, Inc. Peter Van Roy, Seyed H. Haeri (Hossein), Université catholique de Louvain EdgeCom 2016, January 9th, 2016 1
  2. What is Edge Computation? 2

  3. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation 3
  4. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard 3
  5. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity 3
  6. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state 3
  7. Edge Computation • Logical extremes of the network
 Applications, data,

    and computation • Especially challenging where synchronization is hard • “Internet of Things”
 Low power, limited memory and connectivity • Mobile Applications
 Offline operation with replicated, shared state • How should we manage events generated at the device? 3
  8. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing 4
  9. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna 4
  10. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results 4
  11. Traditional Approaches • Centralized computation (D-Streams, Storm, Summingbird)
 Stream all

    events to a centralized location for processing • Most general approach, however expensive
 Events must be buffered while devices are offline; power requirements for operating the antenna • Design a distributed algorithm (Directed/Digest Diffusion, TAG)
 Design an algorithm optimized for program dissemination and collection of results • Least general, however efficient
 Algorithm can be designed specifically to address unordered delivery, and optimized for minimal state transmission 4
  12. Can we design a general programming model for efficient distributed

    computation that can tolerate message delays, reordering, and duplication? 5
  13. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) 6
  14. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set 6
  15. Contributions • Extend previous work on “Lattice Processing” (Lasp)
 Declarative,

    functional programming model over distributed data structures (CRDTs) • Extend our model with new data structures
 Two new data structures: Pair and Bounded-LWW-Set • Extend our model with dynamic scope
 “Dynamic” variables, where each node contains a unique value for a given variable which can be aggregated with a “dynamic” fold operation 6
  16. Background Conflict-Free 
 Replicated Data Types 7 SSS 2011

  17. Conflict-Free 
 Replicated Data Types • Collection of types
 Sets,

    counters, registers, flags, maps 8
  18. Conflict-Free 
 Replicated Data Types • Collection of types
 Sets,

    counters, registers, flags, maps • Strong Eventual Consistency (SEC)
 Objects that receive the same updates, regardless of order, will reach equivalent state 8
  19. RA RB RC

  20. RA RB RC {1} (1, {a}, {}) add(1)

  21. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1)
  22. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1)
  23. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {c}, {}) add(1) {} (1, {c}, {c}) remove(1) {1} {1} {1} (1, {a, c}, {c}) (1, {a, c}, {c}) (1, {a, c}, {c})
  24. Background Lattice Processing 14

  25. Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow

    programming model for “eventually consistent” computations 15
  26. Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow

    programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT 15
  27. Lattice Processing (Lasp) • Distributed, deterministic dataflow
 Distributed, deterministic dataflow

    programming model for “eventually consistent” computations • Convergent data structures
 Primary data abstraction is the CRDT • Enables composition
 Provides functional composition of CRDTs that preserves the SEC property 15
  28. 16 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  29. 17 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  30. 18 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  31. 19 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  32. 20 %% Create initial set. S1 = declare(set), %% Add

    elements to initial set and update. update(S1, {add, [1,2,3]}), %% Create second set. S2 = declare(set), %% Apply map operation between S1 and S2. map(S1, fun(X) -> X * 2 end, S2).
  33. Lattice Processing (Lasp) • Functional and set-theoretic operations on sets


    Product, intersection, union, filter, map, fold 21
  34. Lattice Processing (Lasp) • Functional and set-theoretic operations on sets


    Product, intersection, union, filter, map, fold • Metadata computation
 Performs transformation on the internal metadata of CRDTs allowing creation of “composed” CRDTs 21
  35. Example Application Computing Averages 22

  36. Computing Aggregates • Sensors generate events
 Fold a rolling set

    of events into a local device average 23
  37. Computing Aggregates • Sensors generate events
 Fold a rolling set

    of events into a local device average • Merge local averages per device
 Fold local averages across devices into a global average replicated at each device 23
  38. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 24
  39. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 25
  40. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 26
  41. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 27
  42. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 28
  43. Sensor1 Samples Local Avg Fold Global Avg SensorN Samples Local

    Avg Fold Lasp Operation Input User-Maintained CRDT Output Lasp-Maintained CRDT … Global Avg Fold Fold 29
  44. 30 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage) %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  45. 31 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  46. 32 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  47. 33 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  48. 34 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  49. 35 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  50. 36 %% Define a pair of counters to store the

    global average. GlobalAverage = declare({counter, counter}, global_average), %% Declare a dynamic variable. Samples = declare_dynamic({bounded_lww_set, 100}), %% Define a local average; computed from the local Bounded-LWW set. LocalAverage = declare_dynamic({counter, counter}), %% Register an event handler with the sensor that is triggered each %% time an event X is triggered at a given timestamp T. EventHandler = fun({X, T} -> update(Samples, {add, x, t}, Actor) end register_event_handler(EventHandler), %% Fold samples using the function `avg' into a local average. fold(Samples, fun avg/2, LocalAverage), %% Fold local average using the function `avg' into a global average. fold_dynamic(LocalAverage, fun sum_pairs/2, GlobalAverage)
  51. Lasp Extensions Semantics 37

  52. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements 38
  53. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time 38
  54. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges 38
  55. Bounded LWW Set • Bounded “Last-Writer-Wins” Set
 Same structure as

    Observed-Remove Set, but enforces a maximum number of elements • Objects tagged with local time
 Each object in the set is tagged with a local time from insertion time • Objects marked “removed” when bound exceeded
 Use a tombstone to mark objects as removed when performing insertions and merges • “Last-Writer-Wins” with Single Writer
 Single writer removes possible non-determinism inherent with LWW- registers with replicated state 38
  56. Samples (bound 2) Event Handler 39

  57. Samples (bound 2) Event Handler {1} {(1, T1, F)} 1

    40
  58. Samples (bound 2) Event Handler {1} {(1, T1, F)} 1

    {1, 2} 2 {(1, T1, F), (2, T2, F)} 41
  59. Samples (bound 2) Event Handler {1} {(1, T1, F)} 1

    {1, 2} 2 {(1, T1, F), (2, T2, F)} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} 3 {2, 3} 42
  60. Lasp Fold • Fold
 Computes an aggregate over a set

    into another type of CRDT 43
  61. Lasp Fold • Fold
 Computes an aggregate over a set

    into another type of CRDT • Function invariants
 Operation must be associative, commutative, and have an inverse operation 43
  62. Samples (bound 2) Local Avg 44

  63. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} 45
  64. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} 46
  65. Samples (bound 2) Local Avg {1} {(1, T1, F)} {1,

    1} {(T1, 1)}, {} {(T1, 1)}, {} {1, 2} {(1, T1, F), (2, T2, F)} {(T1, 1}, {T2, 2}}, {} {(T1, 1}, {T2, 1}}, {} {3, 2} {2, 3} {(1, T1, T), (2, T2, F), (3, T3, F)} {2, 3} {(T1, 1}, {T2, 2}, (T3, 3)}, {(T1, 1}} {(T1, 1}, {T2, 1}, (T3, 1)}, {(T1, 1)} {5, 2} {5, 2} 47
  66. Dynamic Scope • Variables exist on all nodes
 Variables exist

    across all nodes with a given identifier with their own value that is not replicated 48
  67. Dynamic Scope • Variables exist on all nodes
 Variables exist

    across all nodes with a given identifier with their own value that is not replicated • Dynamic Fold operation
 Combines the value using a merge across all nodes through pairwise synchronization until fixed point is reached 48
  68. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 49
  69. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 50
  70. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 51
  71. Global Avg Local Avg Fold 1.) “bind” message received 2.)

    fold operation re-executed with new value 3.) “bind” broadcast to peers 52
  72. In Summary 53

  73. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies 54
  74. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model 54
  75. Related Work • Directed / Digest Diffusion
 Energy efficient aggregation

    and program dissemination with optimizations when aggregations are monotonic; however, not tolerant to some network anomalies • Tiny AGgregation
 Declarative method for data collection across sensors using SQL-like syntax; however, not a general programming model • PVARS
 Similar to *Lisp parallel variables, where each processor had it’s own value and could apply an operation across nodes 54
  76. Future Work 55

  77. Future Work • Quantitative evaluation
 Evaluation and optimization of the

    prototype implementation in Erlang 55
  78. Future Work • Quantitative evaluation
 Evaluation and optimization of the

    prototype implementation in Erlang • Optimizations to reduce metadata
 Apply known CRDT optimizations to both the fold operation and data structures to reduce space complexity 55
  79. Thanks! 56 Christopher Meiklejohn @cmeik