Slide 1

Slide 1 text

The Implementation and Use of a Generic Dataflow Behavior for Erlang CHRISTOPHER MEIKLEJOHN* AND PETER VAN ROY Basho Technologies, Inc. and Université catholique de Louvain We propose a new “generic” abstraction for Erlang/OTP that aids in the implementation of dataflow programming languages and models on the Erlang VM. This abstraction simplifies the implementation of “processing elements” in dataflow languages by providing a simple callback interface in the style of the gen_server and gen_fsm abstractions. We motivate the use of this new abstraction by examining the implementation of a distributed dataflow programming variant called Lasp. Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming; E.1 [Data Structures]: Distributed data structures Keywords: Dataflow Programming, Erlang, Concurrent Programming * work performed when the author was employed by Basho Technologies, Inc.

Slide 2

Slide 2 text

Motivation

Slide 3

Slide 3 text

• Provide a dataflow abstraction for Erlang in the style of the “generic” abstractions • Focuses on data and control flow • Support the development of dataflow based programming models* “Derflow”, Bravo et al., Erlang Workshop 2014 
 “DerflowL”, Meiklejohn et al., LADIS Workshop 2014 
 “Lasp”, Meiklejohn et al., PPDP 2015

Slide 4

Slide 4 text

{3,4,5} Intersection {1,2,3} {3}

Slide 5

Slide 5 text

{3,4,5} Intersection {3,4} {1,2,3,4}

Slide 6

Slide 6 text

Behaviour

Slide 7

Slide 7 text

• Module:init/1
 Initialize any necessary state • Module:read/1
 Provide a list of functions that are used to retrieve the value for each argument to processing • Module:process/2
 Process computation given the most recently received arguments from each “source”

Slide 8

Slide 8 text

-module(gen_flow_example). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {pid}). start_link(Args) -> gen_flow:start_link(?MODULE, Args). init([Pid]) -> {ok, #state{pid=Pid}}. read(State) -> ReadFuns = [ fun(_) -> sets:from_list([1,2,3]) end, fun(_) -> sets:from_list([3,4,5]) end ], {ok, ReadFuns, State}. process(Args, #state{pid=Pid}=State) -> case Args of [undefined, _] -> ok; [_, undefined] -> ok; [X, Y] -> Set = sets:intersection(X, Y), Pid ! {ok, sets:to_list(Set)}, ok end, {ok, State}.

Slide 9

Slide 9 text

Execution of gen_flow

Slide 10

Slide 10 text

1. Launch read functions
 Launch a linked process to attempt to read all source variables 2. Cache responses
 As responses are received, cache the most recent value 3. Module:process/2
 Reprocess the function with the most recent values from the cache 4. Propagate result
 The implementer can decide how to propagate values forward with an arbitrary function; via a TCP socket, Erlang message passing, etc.

Slide 11

Slide 11 text

gen_server gen_server gen_flow gen_server

Slide 12

Slide 12 text

gen_server gen_server gen_flow gen_server 1.

Slide 13

Slide 13 text

gen_server gen_server gen_flow gen_server 1. 2.

Slide 14

Slide 14 text

gen_server gen_server gen_flow gen_server 1. 2. 3.

Slide 15

Slide 15 text

gen_server gen_server gen_flow gen_server 1. 2. 3. 4.

Slide 16

Slide 16 text

• If any of the read functions fail; collapse and restart the entire tree. • Results are cached for subsequent executions, given processing may require all.

Slide 17

Slide 17 text

gen_server gen_server gen_flow gen_server 1.

Slide 18

Slide 18 text

gen_server gen_server gen_flow gen_server 1.

Slide 19

Slide 19 text

gen_server gen_server gen_flow gen_server

Slide 20

Slide 20 text

gen_server gen_server gen_flow gen_server 1.

Slide 21

Slide 21 text

Lasp

Slide 22

Slide 22 text

• Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for “eventually consistent” computations • Prototypical implementation
 Implementation in Erlang using gen_flow • Convergent modules
 Primary data abstraction is the CRDT

Slide 23

Slide 23 text

RA RB

Slide 24

Slide 24 text

RA RB 1 set(1)

Slide 25

Slide 25 text

RA RB 1 set(1)

Slide 26

Slide 26 text

RA RB 1 set(1) 3 set(3)

Slide 27

Slide 27 text

RA RB 1 set(1) 3 set(3) 2 set(2)

Slide 28

Slide 28 text

RA RB 1 set(1) 3 set(3) 2 set(2)

Slide 29

Slide 29 text

RA RB 1 set(1) 3 set(3) 2 set(2) ? ?

Slide 30

Slide 30 text

• Concurrency
 How do we resolve concurrent operations? • Semantic resolution
 Resolved by implementer; techniques can be error prone; arbitration results in “anomalies”

Slide 31

Slide 31 text

RA RB 1 set(1)

Slide 32

Slide 32 text

RA RB 1 set(1)

Slide 33

Slide 33 text

RA RB 1 set(1) 3 set(3)

Slide 34

Slide 34 text

RA RB 1 set(1) 3 set(3) 2 set(2)

Slide 35

Slide 35 text

RA RB 1 set(1) 3 set(3) 2 set(2)

Slide 36

Slide 36 text

RA RB 1 set(1) 3 set(3) 2 set(2) 3 3 max(2,3) max(2,3)

Slide 37

Slide 37 text

• Deterministic resolution
 Define a deterministic resolution for objects that closely resembles sequential data structures • Strong Eventual Consistency (SEC)
 Deterministic convergence guarantee once all messages have been delivered to all “replicas” • Non-monotonic operations
 How do we handle operations that are nondeterministic?

Slide 38

Slide 38 text

RA RB RC

Slide 39

Slide 39 text

RA RB RC {1} (1, {a}, {}) add(1)

Slide 40

Slide 40 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {b}, {}) add(1)

Slide 41

Slide 41 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {b}, {}) add(1) {} (1, {b}, {b}) remove(1)

Slide 42

Slide 42 text

RA RB RC {1} (1, {a}, {}) add(1) {1} (1, {b}, {}) add(1) {} (1, {b}, {b}) remove(1) {1} {1} {1} (1, {a, b}, {b}) (1, {a, b}, {b}) (1, {a, b}, {b})

Slide 43

Slide 43 text

• External non-monotonicity
 Implemented through the use of monotonic metadata • Metadata reduction
 Various ways to perform metadata reduction* Bieniusa et al., INRIA RR-8083, 2012 
 Brown et al., PaPEC 2014

Slide 44

Slide 44 text

Lasp API

Slide 45

Slide 45 text

• declare(T)
 Declare a variable of type T • bind(X, V)
 Bind a value to a variable; computes the least- upper-bound between the new and current value • update(X, Op, Actor)
 Update value of a variable with an identifier and a unique actor identifier

Slide 46

Slide 46 text

• read(X, V)
 Read variable at a logical time equal to or greater than a previously observed value • strict_read(X, V)
 Strict inflation version of read(X, V)

Slide 47

Slide 47 text

• map(X, F, Y)
 Apply function over X into Y • filter(X, P, Y)
 Filter X into Y using predicate P • fold(X, Op, Y)
 Fold values from X into Y using operation Op

Slide 48

Slide 48 text

• product(X, Y, Z)
 Compute product of X and Y into Z • union(X, Y, Z)
 Compute union of X and Y into Z • intersection(X, Y, Z)
 Compute intersection of X and Y into Z

Slide 49

Slide 49 text

%% Create initial set. {ok, S1} = lasp:declare(riak_dt_orset), %% Add elements to initial set and update. {ok, _} = lasp:update(S1, {add_all, [1,2,3]}, a), %% Create second set. {ok, S2} = lasp:declare(riak_dt_orset), %% Apply map operation between S1 and S2. {ok, _} = lasp:map(S1, fun(X) -> X * 2 end, S2).

Slide 50

Slide 50 text

{1,2,3} Map {2,4,6}

Slide 51

Slide 51 text

Map {2,4,6} {} {1,2,3} Map Map {} {2,4,6} {1,2,3}

Slide 52

Slide 52 text

Map {2,4,6} {} {1,2,3} Map Map {} {2,4,6} {1,2,3}

Slide 53

Slide 53 text

• Read operations
 Reads should not appear to “go back in time” • Causality
 Functions applied to CRDTs must preserve causal metadata • Quorum operations
 Take the “merge” of a quorum of replicas

Slide 54

Slide 54 text

Lasp Processes

Slide 55

Slide 55 text

• Processes
 Responsible for the read-then-write cycle of Lasp operations • Strict inflations
 Lasp operations read forward; do not return the result of a read operation until “time has advanced” logically

Slide 56

Slide 56 text

-module(lasp_process). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {read_funs, function}). start_link(Args) -> gen_flow:start_link(?MODULE, Args). %% @doc Initialize state. init([ReadFuns, Function]) -> {ok, #state{read_funs=ReadFuns, function=Function}}.

Slide 57

Slide 57 text

%% @doc Return list of read functions. read(#state{read_funs=ReadFuns0}=State) -> ReadFuns = [gen_read_fun(Id, ReadFun) || {Id, ReadFun} <- ReadFuns0], {ok, ReadFuns, State}. %% @doc Generate ReadFun. gen_read_fun(Id, ReadFun) -> fun(Value0) -> Value = case Value0 of undefined -> undefined; {_, _, V} -> V end, {ok, Value1} = ReadFun(Id, {strict, Value}), Value1 end.

Slide 58

Slide 58 text

%% @doc Computation to execute when inputs change. process(Args, #state{function=Function}=State) -> case lists:any(fun(X) -> X =:= undefined end, Args) of true -> ok; false -> erlang:apply(Function, Args) end, {ok, State}.

Slide 59

Slide 59 text

Evaluation

Slide 60

Slide 60 text

• Repeated pattern
 Appeared in previous work on Derflow, DerflowL , and Lasp • Extracted from Lasp
 gen_flow was extracted from the Lasp runtime • Code reduction
 Assisted in ease of implementation of the Lasp language as we added new “operations” • Macros as read functions
 Replacement of read functions at compile time based on EQC, eunit, or actual execution

Slide 61

Slide 61 text

Recent Work

Slide 62

Slide 62 text

• Spawn via proc_lib
 Spawn via the proc_lib facility for proper supervision • System messages
 Support system messages to replace or return state • Debugging
 Support Erlang debugging for tracking received messages • Elixir
 Elixir is looking to bring ideas from gen_flow into their generic abstraction gen_router

Slide 63

Slide 63 text

Related Work

Slide 64

Slide 64 text

• Kahn Process Networks (KPNs)
 Inspiration for gen_flow • Flowpools (EPFL 2013)
 Concurrent, lock-free dataflow abstraction over collections • Javelin (Clojure)
 Cell-based, generic dataflow programming library using Clojure’s protocols • Riak Pipe (Erlang Workshop 2012)
 Distributed dataflow abstraction; fixed topology using Erlang messaging

Slide 65

Slide 65 text

Future Work

Slide 66

Slide 66 text

• Reduce copying between processes
 Reduce the amount of data copied between processes when reading values. • Visualization
 Identify a way to visualize the flow of information between processes.

Slide 67

Slide 67 text

Questions?