The Implementation and Use of a Generic Dataflow Behavior for Erlang

The Implementation and Use of a Generic Dataflow Behavior for Erlang

Erlang Workshop 2015
Christopher Meiklejohn, Peter Van Roy

3e09fee7b359be847ed5fa48f524a3d3?s=128

Christopher Meiklejohn

September 04, 2015
Tweet

Transcript

  1. 1.

    The Implementation and Use of a Generic Dataflow Behavior for

    Erlang CHRISTOPHER MEIKLEJOHN* AND PETER VAN ROY Basho Technologies, Inc. and Université catholique de Louvain We propose a new “generic” abstraction for Erlang/OTP that aids in the implementation of dataflow programming languages and models on the Erlang VM. This abstraction simplifies the implementation of “processing elements” in dataflow languages by providing a simple callback interface in the style of the gen_server and gen_fsm abstractions. We motivate the use of this new abstraction by examining the implementation of a distributed dataflow programming variant called Lasp. Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming; E.1 [Data Structures]: Distributed data structures Keywords: Dataflow Programming, Erlang, Concurrent Programming * work performed when the author was employed by Basho Technologies, Inc.
  2. 3.

    • Provide a dataflow abstraction for Erlang in the style

    of the “generic” abstractions • Focuses on data and control flow • Support the development of dataflow based programming models* “Derflow”, Bravo et al., Erlang Workshop 2014 
 “DerflowL”, Meiklejohn et al., LADIS Workshop 2014 
 “Lasp”, Meiklejohn et al., PPDP 2015
  3. 7.

    • Module:init/1
 Initialize any necessary state • Module:read/1
 Provide a

    list of functions that are used to retrieve the value for each argument to processing • Module:process/2
 Process computation given the most recently received arguments from each “source”
  4. 8.

    -module(gen_flow_example). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {pid}). start_link(Args) ->

    gen_flow:start_link(?MODULE, Args). init([Pid]) -> {ok, #state{pid=Pid}}. read(State) -> ReadFuns = [ fun(_) -> sets:from_list([1,2,3]) end, fun(_) -> sets:from_list([3,4,5]) end ], {ok, ReadFuns, State}. process(Args, #state{pid=Pid}=State) -> case Args of [undefined, _] -> ok; [_, undefined] -> ok; [X, Y] -> Set = sets:intersection(X, Y), Pid ! {ok, sets:to_list(Set)}, ok end, {ok, State}.
  5. 10.

    1. Launch read functions
 Launch a linked process to attempt

    to read all source variables 2. Cache responses
 As responses are received, cache the most recent value 3. Module:process/2
 Reprocess the function with the most recent values from the cache 4. Propagate result
 The implementer can decide how to propagate values forward with an arbitrary function; via a TCP socket, Erlang message passing, etc.
  6. 16.

    • If any of the read functions fail; collapse and

    restart the entire tree. • Results are cached for subsequent executions, given processing may require all.
  7. 21.
  8. 22.

    • Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for

    “eventually consistent” computations • Prototypical implementation
 Implementation in Erlang using gen_flow • Convergent modules
 Primary data abstraction is the CRDT
  9. 23.
  10. 30.

    • Concurrency
 How do we resolve concurrent operations? • Semantic

    resolution
 Resolved by implementer; techniques can be error prone; arbitration results in “anomalies”
  11. 36.
  12. 37.

    • Deterministic resolution
 Define a deterministic resolution for objects that

    closely resembles sequential data structures • Strong Eventual Consistency (SEC)
 Deterministic convergence guarantee once all messages have been delivered to all “replicas” • Non-monotonic operations
 How do we handle operations that are nondeterministic?
  13. 38.
  14. 41.

    RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {b}, {}) add(1) {} (1, {b}, {b}) remove(1)
  15. 42.

    RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {b}, {}) add(1) {} (1, {b}, {b}) remove(1) {1} {1} {1} (1, {a, b}, {b}) (1, {a, b}, {b}) (1, {a, b}, {b})
  16. 43.

    • External non-monotonicity
 Implemented through the use of monotonic metadata

    • Metadata reduction
 Various ways to perform metadata reduction* Bieniusa et al., INRIA RR-8083, 2012 
 Brown et al., PaPEC 2014
  17. 44.
  18. 45.

    • declare(T)
 Declare a variable of type T • bind(X,

    V)
 Bind a value to a variable; computes the least- upper-bound between the new and current value • update(X, Op, Actor)
 Update value of a variable with an identifier and a unique actor identifier
  19. 46.

    • read(X, V)
 Read variable at a logical time equal

    to or greater than a previously observed value • strict_read(X, V)
 Strict inflation version of read(X, V)
  20. 47.

    • map(X, F, Y)
 Apply function over X into Y

    • filter(X, P, Y)
 Filter X into Y using predicate P • fold(X, Op, Y)
 Fold values from X into Y using operation Op
  21. 48.

    • product(X, Y, Z)
 Compute product of X and Y

    into Z • union(X, Y, Z)
 Compute union of X and Y into Z • intersection(X, Y, Z)
 Compute intersection of X and Y into Z
  22. 49.

    %% Create initial set. {ok, S1} = lasp:declare(riak_dt_orset), %% Add

    elements to initial set and update. {ok, _} = lasp:update(S1, {add_all, [1,2,3]}, a), %% Create second set. {ok, S2} = lasp:declare(riak_dt_orset), %% Apply map operation between S1 and S2. {ok, _} = lasp:map(S1, fun(X) -> X * 2 end, S2).
  23. 53.

    • Read operations
 Reads should not appear to “go back

    in time” • Causality
 Functions applied to CRDTs must preserve causal metadata • Quorum operations
 Take the “merge” of a quorum of replicas
  24. 55.

    • Processes
 Responsible for the read-then-write cycle of Lasp operations

    • Strict inflations
 Lasp operations read forward; do not return the result of a read operation until “time has advanced” logically
  25. 56.

    -module(lasp_process). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {read_funs, function}). start_link(Args)

    -> gen_flow:start_link(?MODULE, Args). %% @doc Initialize state. init([ReadFuns, Function]) -> {ok, #state{read_funs=ReadFuns, function=Function}}.
  26. 57.

    %% @doc Return list of read functions. read(#state{read_funs=ReadFuns0}=State) -> ReadFuns

    = [gen_read_fun(Id, ReadFun) || {Id, ReadFun} <- ReadFuns0], {ok, ReadFuns, State}. %% @doc Generate ReadFun. gen_read_fun(Id, ReadFun) -> fun(Value0) -> Value = case Value0 of undefined -> undefined; {_, _, V} -> V end, {ok, Value1} = ReadFun(Id, {strict, Value}), Value1 end.
  27. 58.

    %% @doc Computation to execute when inputs change. process(Args, #state{function=Function}=State)

    -> case lists:any(fun(X) -> X =:= undefined end, Args) of true -> ok; false -> erlang:apply(Function, Args) end, {ok, State}.
  28. 60.

    • Repeated pattern
 Appeared in previous work on Derflow, DerflowL

    , and Lasp • Extracted from Lasp
 gen_flow was extracted from the Lasp runtime • Code reduction
 Assisted in ease of implementation of the Lasp language as we added new “operations” • Macros as read functions
 Replacement of read functions at compile time based on EQC, eunit, or actual execution
  29. 62.

    • Spawn via proc_lib
 Spawn via the proc_lib facility for

    proper supervision • System messages
 Support system messages to replace or return state • Debugging
 Support Erlang debugging for tracking received messages • Elixir
 Elixir is looking to bring ideas from gen_flow into their generic abstraction gen_router
  30. 64.

    • Kahn Process Networks (KPNs)
 Inspiration for gen_flow • Flowpools

    (EPFL 2013)
 Concurrent, lock-free dataflow abstraction over collections • Javelin (Clojure)
 Cell-based, generic dataflow programming library using Clojure’s protocols • Riak Pipe (Erlang Workshop 2012)
 Distributed dataflow abstraction; fixed topology using Erlang messaging
  31. 66.

    • Reduce copying between processes
 Reduce the amount of data

    copied between processes when reading values. • Visualization
 Identify a way to visualize the flow of information between processes.