Towards a Systems Approach to Distributed Programming

Towards a Systems Approach to Distributed Programming

OBT 2018
Los Angeles, California

3e09fee7b359be847ed5fa48f524a3d3?s=128

Christopher Meiklejohn

January 14, 2018
Tweet

Transcript

  1. TOWARDS A SYSTEMS APPROACH TO DISTRIBUTED PROGRAMMING Christopher S. Meiklejohn,

    Peter Van Roy Université catholique de Louvain, Instituto Superior Técnico
  2. DISTRIBUTED APPLICATIONS Distributed applications are the “new normal”  Rich-web

    applications, application inside the datacenter  Example: Uber  Ride uses multiple microservices for supply, demand, and incentives Applications typically use several microservices and systems  Potentially using different databases as underlying storage  Potentially written in different programming languages depending on group/organization Application developers write:  Business logic  Code to “glue” services together – using messaging APIs, services APIs, etc
  3. COMPOSITION PROBLEM API differences  Connecting systems with different guarantees:

    at-most-once vs. at-least-once messaging  Durability guarantees differ between systems, might not match specification APIs not specified  Mostly defined by implementation  Semantics not well defined, and may differ depend on implementations  Might not actually work according to specification Composition doesn’t preserve isolated semantics  Apache Kafka and Apache Zookeeper experimentally evaluated in isolation  Under composition, Kafka loses data (Kingsbury 2015)
  4. PROPOSAL Can we build a higher-level programming language for building

    distributed applications that compiles and executes on existing distributed systems infrastructure? Lasp is our first attempt at this.
  5. LASP (PPDP ‘15) Declarative programming system that allows for distributed

    programming with co-designed runtime system  Implemented as an Erlang library Dataflow programming model CRDTs: ADTs for distributed programming  Data types containing a binary merge function for joining two replicas or updates  Used for value convergence under divergence introduced by concurrent modifications Programming model assumes eventual consistency  Updates will be eventually delivered  Updates may be delivered more than once  Updates may be delivered in any order
  6. Lasp API LASP ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Lasp KVS Data Types Cluster Management gen_tcp RabbitMQ P2P Full KV Backend Redis Riak ETS/DETS Client/Server Multiple backends to storage, some replicated, some not replicated. Multiple network topologies. Riak Riak
  7. CRDT IMPLEMENTATIONS State-based (2011)  Bounded join-semilattices with join as

    merge  Replicas propagate full objects to one another Delta-state based (2015)  Bounded join-semilattices with join as merge  Replicas propagate minimal change representation (join-irreducible) based on knowledge  Requires FIFO ordering between nodes Operation-based (2011)  Concurrent updates are commutative  Propagated updates are generated effectors including causal metadata Pure operation-based (2015)  Sequential data type stored  Operations applied to data type at causal stability (requires causality)  Concurrent updates are commutative  Propagated updates are original update Conflict-Free Replicated Data Types come in a variety of different implementations, depending on the guarantees the system can get from the underlying system and network topology. For our examples, we focus on state-based CRDTs.
  8. CRDT TAXONOMY Sets  Grow-Only Set (G-Set)  Union as

    join  Two-Phase Set (2P-Set)  Pair of G-Sets (add and remove G-Sets)  Coordinate-wise union as join  Observed-Remove Set (OR-Set)  Tag updates with unique identifier marking add/remove  Coordinate-wise union of add and remove tokens per item Counters  Grow-Only Counter (G-Counter)  Vector clock  Coordinate-wise maximum  Positive-Negative Counter (PN-Counter)  Pair of vector clocks  Coordinate-wise maximum Others  Maps, Registers, Booleans, Graphs… We will consider only one implementation of CRDTs at the moment: the state-based implementation that are formalized as bounded join-semilattices. As a brief presentation, sets and counters only.
  9. LASP EXAMPLE %% Create a set A = declare(set) %%

    Derive a new set B = product(A, filter(P, A)) %% Create concurrent process %% to insert into set process do insert(A, random()) end Creates a join-semilattice representation of a set (formalized as CRDT) Creates a morphism to a join-semilattice B under image of product/filter Concurrent additions produce a ‘join’ with A’s state; triggers update of B Variables are asynchronously replicated across all nodes in the system
  10. Lasp API LASP ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Lasp KVS Data Types Cluster Management gen_tcp RabbitMQ P2P Full KV Backend Redis Riak ETS/DETS Client/Server Riak Riak
  11. TYPE SPECIALIZATION If no set removals:  Can we specialize

    to a G-Set? If a single set removal:  Can we specialize to a 2P-Set? If we don’t decrement the counter:  Can we specialize to a G-Counter? Can we choose an appropriate CRDT based on some knowledge of the program? Currently, this must be specified manually by the programmer by declaring the type.
  12. IMPLEMENTATION SPECIALIZATION State-based (2011)  Full semilattice stored and transmitted

     Only requires eventual delivery of messages Delta-state based (2015)  Only transmits change representation as join- irreducible element  Requires per-node FIFO delivery  Ideal for partially connected peer-to-peer due to buffering for FIFO delivery Pure operation-based (2015)  Requires causal delivery and protocol for causal stability which may be problematic in certain high- churn environments  Most efficient in transmission and storage Can we choose an appropriate CRDT based on some knowledge of the program? Delta can be enabled at runtime through configuration parameter. Pure operation-based is WIP.
  13. Lasp API LASP ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Lasp KVS Data Types Cluster Management gen_tcp RabbitMQ P2P Full KV Backend Redis Riak ETS/DETS Client/Server Riak Riak
  14. KV BACKEND Storage for variable state Underlying storage can provide

    any consistency model  Eventual consistency is assumed by the programming model  May be replicated or not, depending on required durability guarantees Supported storage engines  Riak  Redis  Erlang DETS/ETS Lasp KV provides replication between nodes  Full replication assumed  Partial replication provided based on “topics” Underlying storage for the Lasp programming system is pluggable. Topics provide “opt-in” replication for certain variables, grouped by a namespace.
  15. Lasp API LASP ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Lasp KVS Data Types Cluster Management gen_tcp RabbitMQ P2P Full KV Backend Redis Riak ETS/DETS Client/Server Riak Riak
  16. CLUSTER MANAGEMENT Runtime parameter specifies the cluster topology  Static

     Full mesh (via Distributed Erlang)  Full mesh (without Distributed Erlang)  Client-server  Peer-to-peer (via HyParView-inspired protocol)  Publish-subscribe (via external AMQP endpoint) Programming model is topology-agnostic  Peer-to-peer topology will forward messages to other nodes on behalf of others to provide point-to- point messaging  Assumes best-effort delivery Underlying storage for the Lasp programming system is pluggable. Standalone system for Erlang (build by our group for Lasp and adopted in industry) Partisan (our library) operates using a simple message forwarding API that Lasp builds on.
  17. LASP RESTRICTIONS Lasp assumes eventual consistency  Flexibility in the

    choice of the underlying data store and network topologies  Relies on Conflict-Free Replicated Data Types to deal with inherent issues of EC  Out-ot-order message delivery  Duplicate message delivery  Limits the types of applications that can be written with the programming model Eventual consistency is not strong enough for some operations  Not all operations can have a sensible merge operation (assignment to register)  Atomically changing multiple variables (transactions, etc.)  Causality required for ordering of some operations (secondary indexing, references)  Coordination required for precondition-based operations (check-and-set, etc.)
  18. Programming Model IDEAL ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Application State Storage Data Types Distribution Layer TCP RabbitMQ P2P Full Storage Backend Traditional Storage Riak Ephemeral Storage Client/Server Riak Distributed Storage UDP
  19. Programming Model IDEAL ARCHITECTURE Multiple backends. Multiple storage. Unified programming

    model. Application State Storage Data Types Distribution Layer TCP RabbitMQ P2P Full Storage Backend Traditional Storage Riak Ephemeral Storage Client/Server Riak Distributed Storage UDP Focus on the programming model and the open problems.
  20. RELATED WORK Transactional-facility for atomic updates across several objects 

    Microsoft Orleans, Bernstein et al., MSR-TR-2016-1001  Transactions integrated into an actor language on arbitrary storage  Argus, Liskov, CACM 1988  Transactions via guardians in distributed language on CLU Causal ordering of updates for sequential reasoning about application code  Quelea, Sivaramakrishnan et al., PLDI 2015  Assumes eventually consistent storage, providing causal guarantees on top Analysis to determine where coordination is required for precondition invariants  CISE, Gotsman et al., POPL 2016  Assumes causality in underlying storage engine
  21. OPEN CHALLENGES: MODEL Leverage CRDTs where possible  Is it

    possible to determine where sequential data type instances can be replaced with CRDTs?  For example: compatible data types, integer with assignment vs. counter with increment, blind writes Leverage EC where possible  Causal reasoning is important to observe effects immediately  For example: creating a reference to an object, inserting an item into a collection based on a monotonic condition Automatically specialize types  Is it possible to analyze programs to determine the right CRDT implementation or type?  For example: no inserts, use a G-Set; no decrements, use a G-Counter
  22. OPEN CHALLENGES: COMPILATION Topology-aware compiler  Is it possible to,

    based on the topology, prevent the compilation of certain programs?  For example, certain network topologies might not be able to provide transactions:  Peer-to-peer, when the network is partially connected  Publish-subscribe topologies, where clients may not receive messages until they come online Storage-aware compiler  When operating on storage that is weaker than causal consistency, what should happen?  For example:  On Riak, causality is not possible. Therefore, what should happen?  Prevent compilation, disallowing operations that require causality?  Allow compilation, but use a middleware layer to ensure causality?
  23. OPEN CHALLENGES: INFRASTRUCTURE System-awareness  Given a choice of underlying

    data stores, how do I know which one to pick?  Applications might require different stores: strong consistency, causal consistency, weak consistency?  Desire the most available, weakest consistency model – is this compatible with application invariants?  Given a choice of underlying network / communication layers, which are compatible?  What stores should not be allowed given particular storage engines? What about the inverse? Infrastructure ‘glue’ correctness  How do I know the code used to interact with the underlying data store is correct?  For example:  Am I interacting with the underlying storage system correctly?  How do I designate which consistency levels and/or guarantees the underlying infrastructure provides?
  24. TAKEAWAYS Challenge of new runtimes, models, and languages  New

    runtimes for distribution can hardly compete with existing systems solutions  Systems are designed with a particular use case in mind, with an optimized algorithm for that use case  Systems industrialization has higher adoption in industry than new programming models/languages Existing languages are too primitive  Established, proven runtime systems  Mostly concurrent only (even that, is recent)  Lack of primitives for distributed computing with some notable exceptions Can we establish a joint systems-PL research agenda  Find a way to leverage existing systems infrastructure (and/or algorithms)  Build a language (and/or programming model) that can leverage existing systems for large-scale distributed applications