Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deterministic Concurrency for Reliable, Large-Scale Software Systems

Philipp Haller
February 22, 2018
240

Deterministic Concurrency for Reliable, Large-Scale Software Systems

Philipp Haller

February 22, 2018
Tweet

Transcript

  1. Philipp Haller
    Deterministic Concurrency for
    Reliable, Large-Scale Software
    Systems
    Philipp Haller
    KTH Royal Institute of Technology
    Stockholm, Sweden
    MPI-SWS, Kaiserslautern, February 22th, 2018
    1

    View Slide

  2. Philipp Haller
    Have we solved concurrent programming, yet?
    The problem with concurrency:
    – It’s difficult:
    • Race conditions
    • Deadlocks
    • Livelocks
    • Fairness violations
    • …
    2

    View Slide

  3. Philipp Haller
    Non-determinism
    • At the root of several hazards
    • Example 1:
    3
    @volatile var x = 0
    def m(): Unit = {
    Future {
    x = 1
    }
    Future {
    x = 2
    }
    .. // does not access x
    }
    What’s the value of x
    when an invocation
    of m returns?

    View Slide

  4. Philipp Haller
    Reordering not always a problem!
    • Example 2:
    4
    val set = Set.empty[Int]
    Future {
    set.put(1)
    }
    set.put(2)
    Eventually, set contains
    both 1 and 2, always
    Bottom line: it depends on the datatype

    View Slide

  5. Philipp Haller
    … and its operations!
    • Example 3:
    5
    val set = Set.empty[Int]
    Future {
    set.put(1)
    }
    Future {
    if (set.contains(1)) {
    ..
    }
    }
    set.put(2)
    Result depends
    on schedule!

    View Slide

  6. Philipp Haller
    Goal
    • Programming model providing static determinism guarantees
    • Starting from imperative, object-oriented language
    – global state
    – pervasive aliasing
    • Important concerns: expressivity and performance
    6

    View Slide

  7. Philipp Haller
    Approach
    • Extend a future-style programming model with:
    – Lattice-based datatypes
    – Quiescence
    – Resolution of cyclic dependencies
    • Extend Scala's type system for static safety
    7
    Crucial for
    determinism!
    Increases
    expressivity!

    View Slide

  8. Philipp Haller
    Programming model
    Programming model based on two core abstractions:

    cells and cell completers
    – Cell = shared “variable”
    – Cell[K,V]: read-only interface; read values of type V
    – CellCompleter[K,V]: write values of type V to its associated cell
    – V must have an instance of a lattice type class
    8
    Monotonic
    updates

    View Slide

  9. Philipp Haller
    Example
    • Given a social graph
    – vertex = user
    • Traverse graph and collect IDs of “interesting” users
    • Graph large ➟ concurrent traversal
    9

    View Slide

  10. Philipp Haller
    Cell with user IDs
    (code simplified)
    10
    implicit object IntSetLattice extends Lattice[Set[Int]] {
    val empty = Set()
    def join(left: Set[Int], right: Set[Int]) = left ++ right
    }
    // add a user ID
    userIDs.putNext(Set(theUserID))
    val userIDs = CellCompleter[Set[Int]]
    Bounded

    join-semilattice
    [1] Oliveira, Moors, and Odersky. Type classes as objects and implicits. OOPSLA 2010

    View Slide

  11. Philipp Haller
    Reading results
    • Problem: when reading a cell’s value, how do we know this value is not
    going to change any more?
    – There may still be ongoing concurrent activities
    – Manual synchronization (e.g., latches) error-prone
    11

    View Slide

  12. Philipp Haller
    Freeze
    • Introduce operation to freeze cells
    • Attempting to mutate a frozen cell results in a failure
    • May only read from frozen cells
    – Ensures only unchangeable values are read
    – Weakens determinism guarantee
    12
    "All non-failing executions
    compute the same cell values."
    "Quasi-
    determinism"

    View Slide

  13. Philipp Haller
    Reading results: alternative approach
    • Problem: when reading a cell’s value, how do we know this value is not
    going to change any more?
    – There may still be ongoing concurrent activities
    • Alternative solution: Quiescence
    • Stronger than quasi-determinism
    13

    View Slide

  14. Philipp Haller
    Quiescence
    • Intuitively: situation when values of cells are guaranteed not to change any
    more
    • Technically:
    – No concurrent activities ongoing or scheduled which could change
    values of cells
    – Detected by the underlying thread pool
    14

    View Slide

  15. Philipp Haller
    Revisiting the example
    15
    // add a user ID
    userIDs.putNext(Set(theUserID))
    ..
    val pool = new HandlerPool
    val userIDs = CellCompleter[Set[Int]](pool)
    // register handler
    // upon quiescence: read result value of cell
    pool.onQuiescent(userIDs.cell) { collectedIDs =>
    ..
    }
    Safe to read
    from cell when pool
    quiescent!

    View Slide

  16. Philipp Haller
    Reactive threshold reads*
    • When value of a cell "crosses" a threshold set: update another cell
    – "Crosses" = new value greater than one of the values in the threshold
    set
    16
    cell2.whenNext(cell1, Set(v1, v2, v3)) { v =>
    // compute update for cell2
    }
    * non-reactive threshold reads:

    [2] Kuper, Turon, Krishnaswami, and Newton. Freeze after writing: Quasi-deterministic
    parallel programming with LVars. POPL 2014

    View Slide

  17. Philipp Haller
    Reactive threshold reads
    • Determinism requires restricting the threshold set
    • Example: powerset lattice
    17
    cell2.whenNext(cell1, Set(Set(2), Set(2, 5))) { v =>
    NextOutcome(v)
    }
    Init state: cell1 = Set(0), cell2 = Set(0)
    • Permits non-deterministic executions
    cell1.putNext(Set(2))
    cell1.putNext(Set(2, 5))
    => handler sees Set(2)
    cell1.putNext(Set(2, 5))
    cell1.putNext(Set(2))
    => handler sees Set(2, 5)

    View Slide

  18. Philipp Haller
    Reactive threshold reads
    • Determinism requires restricting the threshold set
    • Restriction:

    Incompatibility of elements of threshold set
    – v1, v2 incompatible iff LUB(v1, v2) = Top
    • Concurrent crossings of threshold set due to different elements yield failed
    executions
    – Turn potential non-determinism into failure, thus preserving quasi-
    determinism
    18

    View Slide

  19. Philipp Haller
    Source of data races: variable capture
    • Variable capture by closures passed to whenNext
    19
    var x = 0
    cell2.whenNext(cell1, Set(1)) { v =>
    NextOutcome(x)
    }
    cell3.whenNext(cell1, Set(1)) { v =>
    x = 1
    NoOutcome
    }
    cell1.putNext(1)
    [3] Miller, Haller, and Odersky. Spores: a type-based foundation for closures
    in the age of concurrency and distribution. ECOOP 2014
    Solution: use spores [2] to prevent
    • re-assigning captured variables
    • capturing mutable, shared data
    structures

    View Slide

  20. Philipp Haller
    Other sources of data races
    20
    class C {
    def set(x: Int): Unit =
    Global.f = x
    def get: Int =
    Global.f
    }
    object Global {
    var f: Int = 0
    }
    cell2.whenNext(cell1, Set(1)) { v =>
    val c = new C
    NextOutcome(c.get)
    }
    cell3.whenNext(cell1, Set(1)) { v =>
    val c = new C
    c.set(1)
    NoOutcome
    }
    cell1.putNext(1)

    View Slide

  21. Philipp Haller
    Solution
    • Restrict instantiation within whenNext closures to classes
    conforming to the object capability model [3]
    • A class is conformant* ("ocap") iff
    – its methods only access parameters and this
    – its methods only instantiate ocap classes
    – types of fields and method parameters are ocap
    21
    * simplified
    [4] Mark S. Miller. Robust Composition: Towards a Unified Approach to
    Access Control and Concurrency Control. PhD thesis, 2006

    View Slide

  22. Philipp Haller
    How practical is the object capability requirement?
    • How common is object-capability safe code in Scala?
    • Empirical study of >75,000 SLOC of open-source code
    22

    View Slide

  23. Philipp Haller
    How practical is the object capability requirement?
    • Results:
    23

    View Slide

  24. Philipp Haller
    In the paper
    • Formalization: object-oriented core languages
    • CLC1: type-based notion of object capabilities
    • CLC2: uniqueness via flow-insensitive permissions
    • CLC3: concurrent extension
    • Soundness proof
    • Isolation theorem for processes with shared heap
    24
    [5] Haller and Loiko. LaCasa: Lightweight affinity and object capabilities
    in Scala. OOPSLA 2016

    View Slide

  25. Philipp Haller
    Evaluation
    • Static analysis of JVM bytecode using the OPAL framework (OPen
    extensible Analysis Library)
    – New Scala-based, extensible bytecode toolkit
    – Fully concurrent
    • Rewrote purity analysis and immutability analysis
    • Ran analysis on JDK 8 update 45 (rt.jar)
    – 18’591 class files, 163’268 methods, 77’128 fields
    25
    http://www.opal-project.de

    View Slide

  26. Philipp Haller
    Results: immutability analysis
    • RA about 10x faster than FPCF
    • RA = 294 LOC, FPCF = 424 LOC (1.44x)
    26
    FPCF (secs.)
    1.0
    1.5
    2.0
    2.5
    Reactive-Async (secs.)
    0.1
    0.2
    0.3
    1 Thread 2 Threads 4 Threads 8 Threads 16 Threads
    2.15
    2.20
    2.25
    2.30
    2.35
    1.15
    1.20
    1.25
    1.30
    1.35
    0.290
    0.295
    0.300
    0.105
    0.110
    0.115
    box plot:
    whiskers = min/max

    top/bottom of box = 

    1st and 3rd quartile
    band in box: median
    FPCF = OPAL’s fixed point
    computation framework

    View Slide

  27. Philipp Haller
    Results: purity analysis
    • Best configs. (FPCF: 4 thr, RA: 16 thr): RA 1.75x faster
    • RA = 113 LOC, FPCF = 166 LOC (1.47x)
    27
    FPCF (secs.)
    0.15
    0.20
    0.25
    0.30
    0.35
    Reactive-Async (secs.)
    0.1
    0.2
    0.3
    0.4
    0.5
    1 Thread 2 Threads 4 Threads 8 Threads 16 Threads
    0.43
    0.44
    0.45
    0.09
    0.10
    0.11
    0.335
    0.340
    0.345
    0.165
    0.170
    0.175
    0.180
    0.185
    0.18
    0.19
    0.20

    View Slide

  28. Philipp Haller
    Benchmarks: Monte Carlo simulation
    • Compute Net Present Value
    • 14-16 threads: Scala futures 23-36% faster than RA
    28
    Sequential
    Reactive Async
    Scala Futures 2.11.7
    Runtime (ms)
    0
    1000
    2000
    3000
    4000
    5000
    6000
    Number of Threads
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
    400
    500
    600
    700
    14 15 16

    View Slide

  29. Philipp Haller
    Benchmarks: parallel sum
    • Parallel sum of large collection of random ints
    • Performance competitive with Scala’s futures
    29
    Sequential
    Reactive Async
    Scala Futures 2.11.7
    Runtime (ms)
    0
    50
    100
    150
    200
    250
    300
    Number of Threads
    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

    View Slide

  30. Philipp Haller
    Reactive Async: Conclusion
    • Deterministic concurrent programming model
    – Extension of imperative, object-oriented base language
    – Lattices and quiescence for determinism
    – Resolution of cyclic dependencies
    – Type system for object capabilities for safety
    • First experimental results
    • Ongoing and future work:
    – Complete formal development
    – Implement state-of-the-art static analyses
    30

    View Slide

  31. Philipp Haller
    Changing gears: distribution
    Large-scale web applications, IoT applications,
    serverless computing, etc.
    Distribution essential for:
    Resilience
    31
    Elasticity (subsumes scalability)
    Physically distributed systems
    Availability

    View Slide

  32. Philipp Haller
    Lineage/provenance
    Which resources are required for producing a
    particular expected result?
    Lineage may record information about:
    Data sets read/transformed for producing result data set
    32
    Etc.
    Services used for producing response
    Provides valuable information about
    where to inject faults
    Lineage-driven
    fault injection (LDFI)
    [1] Peter Alvaro et al. Lineage-driven fault injection. In Proceedings of the 2015
    ACM SIGMOD International Conference on Management of Data (SIGMOD '15)

    View Slide

  33. Philipp Haller
    Distributed programming with functional
    lineages a.k.a. function passing
    New data-centric programming model for functional
    processing of distributed data.
    Key ideas:
    33
    Provide lineages by programming abstractions
    Keep data stationary (if possible), send functions
    Utilize lineages for fault injection and recovery

    View Slide

  34. Philipp Haller
    The function passing model
    Introducing
    Consists of 3 parts:
    Silos: stationary, typed, immutable data
    containers
    SiloRefs: references to local or remote Silos.
    Spores: safe, serializable functions.
    34

    View Slide

  35. Philipp Haller
    The function passing model
    Some visual intuition of
    Silo SiloRef
    Master
    Worker
    35

    View Slide

  36. Philipp Haller
    Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Two parts.
    def apply
    def send
    def persist
    def unpersist
    SiloRef. Handle to a Silo.
    Silo. Typed, stationary data container.
    User interacts with SiloRef.
    SiloRefs come with 4 primitive operations.
    36

    View Slide

  37. Philipp Haller
    Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: apply
    Takes a function that is to be applied to the data in the
    silo associated with the SiloRef.
    Creates new silo to contain the data that the user-
    defined function returns; evaluation is deferred
    def apply[S](fun: T => SiloRef[S]): SiloRef[S]
    Enables interesting computation DAGs
    Deferred
    def apply
    def send
    def persist
    def unpersist
    37

    View Slide

  38. Philipp Haller
    Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: send
    Forces the built-up computation DAG to be sent to the
    associated node and applied.
    Future is completed with the result of the computation.
    def send(): Future[T]
    EAGER
    def apply
    def send
    def persist
    def unpersist
    38

    View Slide

  39. Philipp Haller
    Silos
    Silo[T]
    T
    SiloRef[T]
    Silo factories:
    Creates silo on given host populated with given value/text file/…
    object SiloRef {
    def populate[T](host: Host, value: T): SiloRef[T]
    def fromTextFile(host: Host, file: File): SiloRef[List[String]]
    ...
    }
    def apply
    def send
    def persist
    def unpersist
    Deferred
    What are they?
    39

    View Slide

  40. Philipp Haller
    )
    Basic idea: apply/send
    Silo[T]
    Machine 1 Machine 2
    SiloRef[T]
    λ
    T
    SiloRef[S]
    S
    Silo[S]
    )
    T㱺SiloRef[S]
    40

    View Slide

  41. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    41

    View Slide

  42. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    adults
    42

    View Slide

  43. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(spore {
    val localVehicles = vehicles // spore header
    ps =>
    localVehicles.apply(spore {
    val localps = ps // spore header
    vs =>
    SiloRef.populate(currentHost,
    localps.flatMap(p =>
    // list of (p, v) for a single person p
    vs.flatMap {
    v =>
    if (v.owner.name == p.name) List((p, v))
    else Nil
    }
    )
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    43

    View Slide

  44. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    44

    View Slide

  45. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    sorted
    labels
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    45

    View Slide

  46. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels
    so far we just staged
    computation, we haven’t yet
    “kicked it off”.
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    46

    View Slide

  47. Philipp Haller
    More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels λ
    List[Person]㱺List[String]
    Silo[List[String]]
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    labels.persist().send()
    47

    View Slide

  48. Philipp Haller
    A functional design for fault-tolerance
    A SiloRef is a lineage, a persistent (in the sense
    of functional programming) data structure.
    The lineage is the DAG of operations used to
    derive the data of a silo.
    Since the lineage is composed of spores [2], it is
    serializable. This means it can be persisted or
    transferred to other machines.
    Putting lineages to work
    48
    [2] Miller, Haller, and Odersky. Spores: a type-based foundation for closures
    in the age of concurrency and distribution. ECOOP '14

    View Slide

  49. Philipp Haller
    Next: we formalize lineages, a concept from the
    database + systems communities, in the context of
    PL. Natural fit in context of functional programming!
    A functional design for fault-tolerance
    Putting lineages to work
    Formalization: typed, distributed core
    language with spores, silos, and futures.
    49

    View Slide

  50. Philipp Haller 50
    Abstract syntax

    View Slide

  51. Philipp Haller 51
    Local reduction and lineages

    View Slide

  52. Philipp Haller 52
    Distributed reduction

    View Slide

  53. Philipp Haller 53
    Type assignment

    View Slide

  54. Philipp Haller
    Properties of function passing model
    Formalization
    Subject reduction theorem guarantees
    preservation of types under reduction, as well as
    preservation of lineage mobility.
    Progress theorem guarantees the finite
    materialization of remote, lineage-based data.
    54
    First correctness results for a programming model
    for lineage-based distributed computation.

    View Slide

  55. Philipp Haller
    Liveness property: finite materialization
    Properties
    55

    View Slide

  56. Philipp Haller
    Paper
    Details, proofs, etc.
    56
    Haller, Miller, and Müller. A Programming Model and Foundation for Lineage-Based
    Distributed Computation. Journal of Functional Programming. 2018, to appear

    https://infoscience.epfl.ch/record/230304

    View Slide

  57. Philipp Haller
    Building applications with function passing
    Built two miniaturized example systems
    inspired by popular big data frameworks.
    BabySpark
    MBrace
    Implemented Spark RDD operators in terms of
    the primitives of function passing:
    map, reduce, groupBy, and join
    Emulated MBrace using the primitives of
    function passing.
    (distributed collections)
    (F# async for distributing tasks)
    57
    See https://github.com/phaller/f-p/

    View Slide

  58. Philipp Haller
    Conclusion
    • Exploring practical deterministic concurrency
    – For an imperative, object-oriented language
    – Leverage recent advances in type systems
    • Exploring lineage-based distributed programming
    – First correctness results for a programming model based on lineages
    • Finite materialization of distributed, lineage-based data
    • Goal: high expressivity in distributed setting with shared state and fault
    tolerance
    58

    View Slide