Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Consistency Types for Replicated Data in a Higher-order Distributed Programming Language

Philipp Haller
March 15, 2023
15

Consistency Types for Replicated Data in a Higher-order Distributed Programming Language

Philipp Haller

March 15, 2023
Tweet

Transcript

  1. Consistency Types for Replicated Data
    in a Higher-order Distributed
    Programming Language
    Xin Zhao and Philipp Haller
    KTH Royal Institute of Technology
    Stockholm, Sweden
    International Conference on the Art, Science,
    and Engineering of Programming
    (‹Programming› 2023)

    Tokyo, Japan & Online, March 13-17, 2023

    View Slide

  2. Philipp Haller
    Context: Scalable Distributed Services
    • Distributed applications


    – providing scalable services


    – that run on virtualized cloud infrastructures


    – in one or more datacenters


    • Examples:

    E-commerce platforms, communication services, social media platforms,
    game servers, etc.
    2
    Support many
    concurrent clients!

    View Slide

  3. Philipp Haller
    Fault-tolerant Distributed Systems
    Basic problem: multiple clients access shared distributed data concurrently
    3
    Client 1 Client 2 Client 3
    Data Safety goal:

    Never return incorrect result
    under non-Byzantine failure
    conditions, including
    • network delays, partitions, and
    • packet loss, duplication, and
    reordering
    read write read

    View Slide

  4. Philipp Haller
    Solution: Replication, Consensus
    Replicate data, distributed consensus ensures safety
    4
    Client 1 Client 2 Client 3
    Data Safety goal:

    Never return incorrect result
    under non-Byzantine failure
    conditions, including
    • network delays, partitions, and
    • packet loss, duplication, and
    reordering
    read write read
    Data
    Data

    View Slide

  5. Philipp Haller
    Challenges
    Replicate data, distributed consensus ensures safety
    5
    Client 1 Client 2 Client 3
    Data • Each update requires distributed
    consensus
    • Leads to poor performance, poor
    scalability, and high latency,

    especially in geo-replicated systems
    read write read
    Data
    Data

    View Slide

  6. Philipp Haller
    Geo-Distribution Challenge
    • Operating a service in multiple datacenters can improve latency and
    availability for geographically distributed clients


    • Geo-distribution directly supported by today's cloud platforms


    • Challenge: round-trip latency


    – < 2ms between servers within the same datacenter


    – up to two orders of magnitude higher between distant datacenters
    6
    Naive reuse of single-datacenter application
    architectures and protocols leads to poor performance!

    View Slide

  7. Philipp Haller
    (Partial) Remedy: Eventual Consistency
    Eventual consistency promises better availability and performance than
    strong consistency (= serializing updates in a global total order)


    • Each update executes at some replica (e.g., geographically closest)
    without synchronization


    • Each update is propagated asynchronously to the other replicas


    • All updates eventually take effect at all replicas, possibly in different orders
    7

    View Slide

  8. Philipp Haller
    Eventual Consistency
    • Updates are applied without synchronization


    • Updates/states propagated asynchronously to other replicas


    • All updates eventually take effect at all replicas, possibly in different orders
    8
    Image source: Shapiro, Preguica, Baquero, and Zawirski: Conflict-Free Replicated Data Types. SSS 2011

    View Slide

  9. Philipp Haller
    Strong Eventual Consistency (SEC)
    • Strong Eventual Consistency (SEC):
    – leverages mathematical properties that ensure absence of conflict,
    i.e., commutativity of update merging


    • A Conflict-Free Replicated Datatype (CRDT)1 provides SEC:


    – CRDT replicas provably converge to a correct common state


    – CRDTs remain available and scalable despite high network latency,
    failures, or network partitioning
    9
    1 Shapiro, Preguica, Baquero, and Zawirski: Conflict-Free Replicated Data Types. SSS 2011

    View Slide

  10. Philipp Haller
    Consistency Types: Idea
    To satisfy a range of performance, scalability, and consistency requirements,
    provide two different kinds of replicated data types (RDTs):


    1. Consistent data types:
    – Serialize updates in a global total order: sequential consistency


    – Do not provide availability (in favor of partition tolerance2)


    2. Available data types:
    – Guarantee availability and performance (and partition tolerance)


    – Weaken consistency: strong eventual consistency
    10
    2 Gilbert and Lynch: Brewer's conjecture and the feasibility of consistent, available, partition-
    tolerant web services. SIGACT News 33(2), 51-59 (2002)

    View Slide

  11. Philipp Haller
    Generalization:

    Observable Atomic Consistency (OAC)
    • Provide an RDT storing values of a lattice (actually, a join-semilattice)


    – Example: lattice = non-negative integers where join(x, y) = max(x, y)


    • The RDT supports operations with different consistency levels:


    – a totally-ordered operation (“TOp”) atomically synchronizes the
    replicas upon its execution;


    – a convergent operation (“CvOp”) is commutative; it is processed
    asynchronously.
    11
    Zhao and Haller: Replicated data types that unify eventual consistency and observable atomic
    consistency. J. Log. Algebraic Methods Program. 114: 100561 (2020)

    View Slide

  12. Philipp Haller
    Observable Atomic Consistency: Example
    • Auction system:
    – RDT maintains highest bidder including bid and ID of bidder


    – State of RDT = (bid: Int, bidderID: Int)


    – Update of (local) state upon submission of new bid:


    • submit is commutative:
    12
    def submit(out s: (Int, Int), bid: Int, bidderID: Int) =


    if (bid > s._1)


    s := (bid, bidderID)
    submit(s, 10, 1); submit(s, 20, 2) -> s == (20, 2)


    submit(s, 20, 2); submit(s, 10, 1) -> s == (20, 2)

    View Slide

  13. Philipp Haller
    Observable Atomic Consistency: Example (2)
    Since submitting a bid is commutative, submit can be executed as a CvOp:
    13
    R1 R2 R3
    CvOp(submit(20,2))
    (0,0) (0,0) (0,0)
    (20,2)
    CvOp(submit(10,1))
    (20,2)
    (10,1)
    (20,2)
    (10,1)
    No update!

    View Slide

  14. Philipp Haller
    Observable Atomic Consistency: Example (3)
    Assume R1 receives a request to close the auction and return the highest bid:
    14
    R1 R2 R3
    CvOp(submit(20,2))
    (0,0) (0,0) (0,0)
    (20,2)
    CvOp(submit(10,1))
    (20,2)
    (10,1)
    (20,2)
    (10,1)
    Op(close())
    Should not return (10,1):

    Replicas not consistent!

    View Slide

  15. Philipp Haller
    Observable Atomic Consistency: Example (4)
    Assume R1 receives a request to close the auction and return the highest bid:
    15
    R1 R2 R3
    CvOp(submit(20,2))
    (0,0) (0,0) (0,0)
    (20,2)
    CvOp(submit(10,1))
    (10,1)
    (10,1)
    TOp(close())
    (20,2) (20,2) (20,2)
    Distributed consensus
    Return (20,2) Zhao and Haller: Replicated data types that unify eventual consistency and observable
    atomic consistency. J. Log. Algebraic Methods Program. 114: 100561 (2020)

    View Slide

  16. Philipp Haller
    A New CvOp
    • Now, we can use both CvOps and TOps with the same RDT


    • Example:


    – Add poll operation to retrieve the current highest bidder


    – In order to ensure high availability, implement poll as CvOp
    16
    def getHighestBid(auctionID: Int): (Int, Int) =


    getRef(auctionID).poll()


    def updateDisplay(auctionID: Int) =


    show("Highest bid: " + getHighestBid(auctionID)._1)

    View Slide

  17. Philipp Haller
    A Notification Service
    • Periodically, the auction service should send a message to all bidders to
    inform them about the current highest bid:
    17
    def notifyAll(auctionID: Int, bidders: List[Int]) = {


    val (hBid, hBidder) = getHighestBid(auctionID)


    bidders.foreach { bidderID =>


    if (bidderID == hBidder)


    send(bidderID, "You have the highest bid!")


    else


    send(bidderID, "The highest bid is: " + hBid)


    }


    }
    May be
    inconsistent!
    Problem: notification based on
    inconsistent information!

    View Slide

  18. Philipp Haller
    CTRD: Consistency Types for Replicated Data
    • Type system that distinguishes values according to their consistency


    • Consistency represented as labels attached to types and values


    • A label l can be loc (local), con (consistent), oac (OAC), or ava (available)


    • Labels are ordered:


    • The label ordering expresses permitted data flow: loc !"con"!"oac"!"ava
    • Labeled types are covariant in their labels:
    18
    ava"!"con

    View Slide

  19. Philipp Haller
    Syntax
    Essentially: STLC extended with (distributed) ML-style references and labels
    19

    View Slide

  20. Philipp Haller
    Select Typing Rules
    • Example 1: t1con := t2ava
    • Example 2: if xava then tcon := 1con else tcon := 0con
    20
    Illegal!
    Illegal!

    View Slide

  21. Philipp Haller
    Attempted “Fix” 1
    21
    def send(ID: [email protected], msg: String): Unit = ...


    def getHighestBid(auctionID: Int): (Int, Int)@ava =


    getRef(auctionID).poll()


    def notifyAll(auctionID: Int, bidders: List[Int]) = {


    val (hBid, hBidder): (Int, Int)@con =


    getHighestBid(auctionID)


    bidders.foreach { bidderID =>


    if (bidderID == hBidder)


    send(bidderID, "You have the highest bid!")


    else


    send(bidderID, "The highest bid is: " + hBid)


    }


    (Int,Int)@ava <: (Int,Int)@con

    View Slide

  22. Philipp Haller
    Attempted “Fix” 2
    22
    def send(ID: [email protected], msg: String): Unit = ...


    def getHighestBid(auctionID: Int): (Int, Int)@ava =


    getRef(auctionID).poll()


    def notifyAll(auctionID: Int, bidders: List[Int]) = {


    val (hBid, hBidder): (Int, Int)@ava =


    getHighestBid(auctionID)


    bidders.foreach { bidderID =>


    if (bidderID == hBidder)


    send(bidderID, "You have the highest bid!")


    else


    send(bidderID, "The highest bid is: " + hBid)


    }


    [email protected] <: [email protected]
    Condition has label ava !"
    #$%%&'()"*+,,-."/+0&"*-,"1+#&1"
    $,"./&"#'+,*/&23
    Implicit
    information flow

    View Slide

  23. Philipp Haller
    The Real Fix
    23
    def send(ID: [email protected], msg: String): Unit = ...


    def getHighestBid(auctionID: Int): (Int, Int)@con =


    getRef(auctionID).consistentRead()


    def notifyAll(auctionID: Int, bidders: List[Int]) = {


    val (hBid, hBidder): (Int, Int)@con =


    getHighestBid(auctionID)


    bidders.foreach { bidderID =>


    if (bidderID == hBidder)


    send(bidderID, "You have the highest bid!")


    else


    send(bidderID, "The highest bid is: " + hBid)


    }


    Must strengthen
    consistency!

    View Slide

  24. Philipp Haller
    Results
    • Distributed small-step operational semantics


    • Formalizes RDTs including observable atomic consistency; operations via
    message passing


    • Proofs of correctness properties:


    • Type soundness (preservation + progress) ! no run-time label violations!


    • Noninterference

    E.g., mutation of ava-labelled references cannot be observed via con-labelled
    values


    • Proofs of consistency properties:


    • Theorem: For con operations, CTRD ensures sequential consistency


    • Theorem: For ava operations, CTRD ensures eventual consistency
    24

    View Slide

  25. Philipp Haller
    Selected Related Work (1)
    • Inconsistent, Performance-bound, Approximate (IPA)3 storage system
    – Goals: consistency safety and error-bounded consistency


    – Limitations:


    • Only direct invalid information flows prevented


    • No proof of type soundness


    – Our work:


    • Also prevents implicit invalid information flows


    • Provides proofs of correctness and consistency properties
    25
    3 Holt, Bornholt, Zhang, Ports, Oskin, Ceze: Disciplined Inconsistency with Consistency Types.
    SoCC 2016: 279-293

    View Slide

  26. Philipp Haller
    Selected Related Work (2)
    • ConSysT4 language for distributed systems
    – Integrates consistency and availability with an object-oriented
    programming programming model


    – Provides correctness proof for an OO core calculus


    – Provides an implementation as a Java extension and middleware


    • Our work:


    – ML-style higher-order functional language


    – Integrates observable atomic consistency (OAC) for increased flexibility


    – First published proofs of type soundness and noninterference (LCPC ’19)
    26
    4 Köhler, Eskandani, Weisenburger, Margara, Salvaneschi: Rethinking safe consistency in distributed
    object-oriented programming. Proc. ACM Program. Lang. 4(OOPSLA): 188:1-188:30 (2020)

    View Slide

  27. Philipp Haller
    Conclusion
    CTRD: Consistency Types for Replicated Data
    • A distributed, higher-order language with replicated types and consistency labels


    • Enables safe mixing of strongly consistent and available (weakly consistent) data


    • Proofs of type soundness and noninterference, and consistency properties


    • Integrates observable atomic consistency which provides high availability through
    convergent operations and strong consistency through totally-ordered operations


    Future work:
    • Practical implementation integrated with OACP


    • Going beyond the RDT abstraction
    27
    Thanks!

    View Slide