Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Implementation and Use of a Generic Dataflow Behavior for Erlang

The Implementation and Use of a Generic Dataflow Behavior for Erlang

Erlang Workshop 2015
Christopher Meiklejohn, Peter Van Roy

Christopher Meiklejohn

September 04, 2015
Tweet

More Decks by Christopher Meiklejohn

Other Decks in Research

Transcript

  1. The Implementation and Use of a Generic Dataflow Behavior for

    Erlang CHRISTOPHER MEIKLEJOHN* AND PETER VAN ROY Basho Technologies, Inc. and Université catholique de Louvain We propose a new “generic” abstraction for Erlang/OTP that aids in the implementation of dataflow programming languages and models on the Erlang VM. This abstraction simplifies the implementation of “processing elements” in dataflow languages by providing a simple callback interface in the style of the gen_server and gen_fsm abstractions. We motivate the use of this new abstraction by examining the implementation of a distributed dataflow programming variant called Lasp. Categories and Subject Descriptors: D.1.3 [Programming Techniques]: Concurrent Programming; E.1 [Data Structures]: Distributed data structures Keywords: Dataflow Programming, Erlang, Concurrent Programming * work performed when the author was employed by Basho Technologies, Inc.
  2. • Provide a dataflow abstraction for Erlang in the style

    of the “generic” abstractions • Focuses on data and control flow • Support the development of dataflow based programming models* “Derflow”, Bravo et al., Erlang Workshop 2014 
 “DerflowL”, Meiklejohn et al., LADIS Workshop 2014 
 “Lasp”, Meiklejohn et al., PPDP 2015
  3. • Module:init/1
 Initialize any necessary state • Module:read/1
 Provide a

    list of functions that are used to retrieve the value for each argument to processing • Module:process/2
 Process computation given the most recently received arguments from each “source”
  4. -module(gen_flow_example). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {pid}). start_link(Args) ->

    gen_flow:start_link(?MODULE, Args). init([Pid]) -> {ok, #state{pid=Pid}}. read(State) -> ReadFuns = [ fun(_) -> sets:from_list([1,2,3]) end, fun(_) -> sets:from_list([3,4,5]) end ], {ok, ReadFuns, State}. process(Args, #state{pid=Pid}=State) -> case Args of [undefined, _] -> ok; [_, undefined] -> ok; [X, Y] -> Set = sets:intersection(X, Y), Pid ! {ok, sets:to_list(Set)}, ok end, {ok, State}.
  5. 1. Launch read functions
 Launch a linked process to attempt

    to read all source variables 2. Cache responses
 As responses are received, cache the most recent value 3. Module:process/2
 Reprocess the function with the most recent values from the cache 4. Propagate result
 The implementer can decide how to propagate values forward with an arbitrary function; via a TCP socket, Erlang message passing, etc.
  6. • If any of the read functions fail; collapse and

    restart the entire tree. • Results are cached for subsequent executions, given processing may require all.
  7. • Distributed, deterministic dataflow
 Distributed, deterministic dataflow programming model for

    “eventually consistent” computations • Prototypical implementation
 Implementation in Erlang using gen_flow • Convergent modules
 Primary data abstraction is the CRDT
  8. • Concurrency
 How do we resolve concurrent operations? • Semantic

    resolution
 Resolved by implementer; techniques can be error prone; arbitration results in “anomalies”
  9. • Deterministic resolution
 Define a deterministic resolution for objects that

    closely resembles sequential data structures • Strong Eventual Consistency (SEC)
 Deterministic convergence guarantee once all messages have been delivered to all “replicas” • Non-monotonic operations
 How do we handle operations that are nondeterministic?
  10. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {b}, {}) add(1) {} (1, {b}, {b}) remove(1)
  11. RA RB RC {1} (1, {a}, {}) add(1) {1} (1,

    {b}, {}) add(1) {} (1, {b}, {b}) remove(1) {1} {1} {1} (1, {a, b}, {b}) (1, {a, b}, {b}) (1, {a, b}, {b})
  12. • External non-monotonicity
 Implemented through the use of monotonic metadata

    • Metadata reduction
 Various ways to perform metadata reduction* Bieniusa et al., INRIA RR-8083, 2012 
 Brown et al., PaPEC 2014
  13. • declare(T)
 Declare a variable of type T • bind(X,

    V)
 Bind a value to a variable; computes the least- upper-bound between the new and current value • update(X, Op, Actor)
 Update value of a variable with an identifier and a unique actor identifier
  14. • read(X, V)
 Read variable at a logical time equal

    to or greater than a previously observed value • strict_read(X, V)
 Strict inflation version of read(X, V)
  15. • map(X, F, Y)
 Apply function over X into Y

    • filter(X, P, Y)
 Filter X into Y using predicate P • fold(X, Op, Y)
 Fold values from X into Y using operation Op
  16. • product(X, Y, Z)
 Compute product of X and Y

    into Z • union(X, Y, Z)
 Compute union of X and Y into Z • intersection(X, Y, Z)
 Compute intersection of X and Y into Z
  17. %% Create initial set. {ok, S1} = lasp:declare(riak_dt_orset), %% Add

    elements to initial set and update. {ok, _} = lasp:update(S1, {add_all, [1,2,3]}, a), %% Create second set. {ok, S2} = lasp:declare(riak_dt_orset), %% Apply map operation between S1 and S2. {ok, _} = lasp:map(S1, fun(X) -> X * 2 end, S2).
  18. • Read operations
 Reads should not appear to “go back

    in time” • Causality
 Functions applied to CRDTs must preserve causal metadata • Quorum operations
 Take the “merge” of a quorum of replicas
  19. • Processes
 Responsible for the read-then-write cycle of Lasp operations

    • Strict inflations
 Lasp operations read forward; do not return the result of a read operation until “time has advanced” logically
  20. -module(lasp_process). -behaviour(gen_flow). -export([start_link/1]). -export([init/1, read/1, process/2]). -record(state, {read_funs, function}). start_link(Args)

    -> gen_flow:start_link(?MODULE, Args). %% @doc Initialize state. init([ReadFuns, Function]) -> {ok, #state{read_funs=ReadFuns, function=Function}}.
  21. %% @doc Return list of read functions. read(#state{read_funs=ReadFuns0}=State) -> ReadFuns

    = [gen_read_fun(Id, ReadFun) || {Id, ReadFun} <- ReadFuns0], {ok, ReadFuns, State}. %% @doc Generate ReadFun. gen_read_fun(Id, ReadFun) -> fun(Value0) -> Value = case Value0 of undefined -> undefined; {_, _, V} -> V end, {ok, Value1} = ReadFun(Id, {strict, Value}), Value1 end.
  22. %% @doc Computation to execute when inputs change. process(Args, #state{function=Function}=State)

    -> case lists:any(fun(X) -> X =:= undefined end, Args) of true -> ok; false -> erlang:apply(Function, Args) end, {ok, State}.
  23. • Repeated pattern
 Appeared in previous work on Derflow, DerflowL

    , and Lasp • Extracted from Lasp
 gen_flow was extracted from the Lasp runtime • Code reduction
 Assisted in ease of implementation of the Lasp language as we added new “operations” • Macros as read functions
 Replacement of read functions at compile time based on EQC, eunit, or actual execution
  24. • Spawn via proc_lib
 Spawn via the proc_lib facility for

    proper supervision • System messages
 Support system messages to replace or return state • Debugging
 Support Erlang debugging for tracking received messages • Elixir
 Elixir is looking to bring ideas from gen_flow into their generic abstraction gen_router
  25. • Kahn Process Networks (KPNs)
 Inspiration for gen_flow • Flowpools

    (EPFL 2013)
 Concurrent, lock-free dataflow abstraction over collections • Javelin (Clojure)
 Cell-based, generic dataflow programming library using Clojure’s protocols • Riak Pipe (Erlang Workshop 2012)
 Distributed dataflow abstraction; fixed topology using Erlang messaging
  26. • Reduce copying between processes
 Reduce the amount of data

    copied between processes when reading values. • Visualization
 Identify a way to visualize the flow of information between processes.