Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Deterministic Concurrency for Reliable, Large-Scale Software Systems

Philipp Haller
February 22, 2018
340

Deterministic Concurrency for Reliable, Large-Scale Software Systems

Philipp Haller

February 22, 2018
Tweet

More Decks by Philipp Haller

Transcript

  1. Philipp Haller Deterministic Concurrency for Reliable, Large-Scale Software Systems Philipp

    Haller KTH Royal Institute of Technology Stockholm, Sweden MPI-SWS, Kaiserslautern, February 22th, 2018 1
  2. Philipp Haller Have we solved concurrent programming, yet? The problem

    with concurrency: – It’s difficult: • Race conditions • Deadlocks • Livelocks • Fairness violations • … 2
  3. Philipp Haller Non-determinism • At the root of several hazards

    • Example 1: 3 @volatile var x = 0 def m(): Unit = { Future { x = 1 } Future { x = 2 } .. // does not access x } What’s the value of x when an invocation of m returns?
  4. Philipp Haller Reordering not always a problem! • Example 2:

    4 val set = Set.empty[Int] Future { set.put(1) } set.put(2) Eventually, set contains both 1 and 2, always Bottom line: it depends on the datatype
  5. Philipp Haller … and its operations! • Example 3: 5

    val set = Set.empty[Int] Future { set.put(1) } Future { if (set.contains(1)) { .. } } set.put(2) Result depends on schedule!
  6. Philipp Haller Goal • Programming model providing static determinism guarantees

    • Starting from imperative, object-oriented language – global state – pervasive aliasing • Important concerns: expressivity and performance 6
  7. Philipp Haller Approach • Extend a future-style programming model with:

    – Lattice-based datatypes – Quiescence – Resolution of cyclic dependencies • Extend Scala's type system for static safety 7 Crucial for determinism! Increases expressivity!
  8. Philipp Haller Programming model Programming model based on two core

    abstractions:
 cells and cell completers – Cell = shared “variable” – Cell[K,V]: read-only interface; read values of type V – CellCompleter[K,V]: write values of type V to its associated cell – V must have an instance of a lattice type class 8 Monotonic updates
  9. Philipp Haller Example • Given a social graph – vertex

    = user • Traverse graph and collect IDs of “interesting” users • Graph large ➟ concurrent traversal 9
  10. Philipp Haller Cell with user IDs (code simplified) 10 implicit

    object IntSetLattice extends Lattice[Set[Int]] { val empty = Set() def join(left: Set[Int], right: Set[Int]) = left ++ right } // add a user ID userIDs.putNext(Set(theUserID)) val userIDs = CellCompleter[Set[Int]] Bounded
 join-semilattice [1] Oliveira, Moors, and Odersky. Type classes as objects and implicits. OOPSLA 2010
  11. Philipp Haller Reading results • Problem: when reading a cell’s

    value, how do we know this value is not going to change any more? – There may still be ongoing concurrent activities – Manual synchronization (e.g., latches) error-prone 11
  12. Philipp Haller Freeze • Introduce operation to freeze cells •

    Attempting to mutate a frozen cell results in a failure • May only read from frozen cells – Ensures only unchangeable values are read – Weakens determinism guarantee 12 "All non-failing executions compute the same cell values." "Quasi- determinism"
  13. Philipp Haller Reading results: alternative approach • Problem: when reading

    a cell’s value, how do we know this value is not going to change any more? – There may still be ongoing concurrent activities • Alternative solution: Quiescence • Stronger than quasi-determinism 13
  14. Philipp Haller Quiescence • Intuitively: situation when values of cells

    are guaranteed not to change any more • Technically: – No concurrent activities ongoing or scheduled which could change values of cells – Detected by the underlying thread pool 14
  15. Philipp Haller Revisiting the example 15 // add a user

    ID userIDs.putNext(Set(theUserID)) .. val pool = new HandlerPool val userIDs = CellCompleter[Set[Int]](pool) // register handler // upon quiescence: read result value of cell pool.onQuiescent(userIDs.cell) { collectedIDs => .. } Safe to read from cell when pool quiescent!
  16. Philipp Haller Reactive threshold reads* • When value of a

    cell "crosses" a threshold set: update another cell – "Crosses" = new value greater than one of the values in the threshold set 16 cell2.whenNext(cell1, Set(v1, v2, v3)) { v => // compute update for cell2 } * non-reactive threshold reads:
 [2] Kuper, Turon, Krishnaswami, and Newton. Freeze after writing: Quasi-deterministic parallel programming with LVars. POPL 2014
  17. Philipp Haller Reactive threshold reads • Determinism requires restricting the

    threshold set • Example: powerset lattice 17 cell2.whenNext(cell1, Set(Set(2), Set(2, 5))) { v => NextOutcome(v) } Init state: cell1 = Set(0), cell2 = Set(0) • Permits non-deterministic executions cell1.putNext(Set(2)) cell1.putNext(Set(2, 5)) => handler sees Set(2) cell1.putNext(Set(2, 5)) cell1.putNext(Set(2)) => handler sees Set(2, 5)
  18. Philipp Haller Reactive threshold reads • Determinism requires restricting the

    threshold set • Restriction:
 Incompatibility of elements of threshold set – v1, v2 incompatible iff LUB(v1, v2) = Top • Concurrent crossings of threshold set due to different elements yield failed executions – Turn potential non-determinism into failure, thus preserving quasi- determinism 18
  19. Philipp Haller Source of data races: variable capture • Variable

    capture by closures passed to whenNext 19 var x = 0 cell2.whenNext(cell1, Set(1)) { v => NextOutcome(x) } cell3.whenNext(cell1, Set(1)) { v => x = 1 NoOutcome } cell1.putNext(1) [3] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 Solution: use spores [2] to prevent • re-assigning captured variables • capturing mutable, shared data structures
  20. Philipp Haller Other sources of data races 20 class C

    { def set(x: Int): Unit = Global.f = x def get: Int = Global.f } object Global { var f: Int = 0 } cell2.whenNext(cell1, Set(1)) { v => val c = new C NextOutcome(c.get) } cell3.whenNext(cell1, Set(1)) { v => val c = new C c.set(1) NoOutcome } cell1.putNext(1)
  21. Philipp Haller Solution • Restrict instantiation within whenNext closures to

    classes conforming to the object capability model [3] • A class is conformant* ("ocap") iff – its methods only access parameters and this – its methods only instantiate ocap classes – types of fields and method parameters are ocap 21 * simplified [4] Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, 2006
  22. Philipp Haller How practical is the object capability requirement? •

    How common is object-capability safe code in Scala? • Empirical study of >75,000 SLOC of open-source code 22
  23. Philipp Haller In the paper • Formalization: object-oriented core languages

    • CLC1: type-based notion of object capabilities • CLC2: uniqueness via flow-insensitive permissions • CLC3: concurrent extension • Soundness proof • Isolation theorem for processes with shared heap 24 [5] Haller and Loiko. LaCasa: Lightweight affinity and object capabilities in Scala. OOPSLA 2016
  24. Philipp Haller Evaluation • Static analysis of JVM bytecode using

    the OPAL framework (OPen extensible Analysis Library) – New Scala-based, extensible bytecode toolkit – Fully concurrent • Rewrote purity analysis and immutability analysis • Ran analysis on JDK 8 update 45 (rt.jar) – 18’591 class files, 163’268 methods, 77’128 fields 25 http://www.opal-project.de
  25. Philipp Haller Results: immutability analysis • RA about 10x faster

    than FPCF • RA = 294 LOC, FPCF = 424 LOC (1.44x) 26 FPCF (secs.) 1.0 1.5 2.0 2.5 Reactive-Async (secs.) 0.1 0.2 0.3 1 Thread 2 Threads 4 Threads 8 Threads 16 Threads 2.15 2.20 2.25 2.30 2.35 1.15 1.20 1.25 1.30 1.35 0.290 0.295 0.300 0.105 0.110 0.115 box plot: whiskers = min/max
 top/bottom of box = 
 1st and 3rd quartile band in box: median FPCF = OPAL’s fixed point computation framework
  26. Philipp Haller Results: purity analysis • Best configs. (FPCF: 4

    thr, RA: 16 thr): RA 1.75x faster • RA = 113 LOC, FPCF = 166 LOC (1.47x) 27 FPCF (secs.) 0.15 0.20 0.25 0.30 0.35 Reactive-Async (secs.) 0.1 0.2 0.3 0.4 0.5 1 Thread 2 Threads 4 Threads 8 Threads 16 Threads 0.43 0.44 0.45 0.09 0.10 0.11 0.335 0.340 0.345 0.165 0.170 0.175 0.180 0.185 0.18 0.19 0.20
  27. Philipp Haller Benchmarks: Monte Carlo simulation • Compute Net Present

    Value • 14-16 threads: Scala futures 23-36% faster than RA 28 Sequential Reactive Async Scala Futures 2.11.7 Runtime (ms) 0 1000 2000 3000 4000 5000 6000 Number of Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 400 500 600 700 14 15 16
  28. Philipp Haller Benchmarks: parallel sum • Parallel sum of large

    collection of random ints • Performance competitive with Scala’s futures 29 Sequential Reactive Async Scala Futures 2.11.7 Runtime (ms) 0 50 100 150 200 250 300 Number of Threads 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  29. Philipp Haller Reactive Async: Conclusion • Deterministic concurrent programming model

    – Extension of imperative, object-oriented base language – Lattices and quiescence for determinism – Resolution of cyclic dependencies – Type system for object capabilities for safety • First experimental results • Ongoing and future work: – Complete formal development – Implement state-of-the-art static analyses 30
  30. Philipp Haller Changing gears: distribution Large-scale web applications, IoT applications,

    serverless computing, etc. Distribution essential for: Resilience 31 Elasticity (subsumes scalability) Physically distributed systems Availability
  31. Philipp Haller Lineage/provenance Which resources are required for producing a

    particular expected result? Lineage may record information about: Data sets read/transformed for producing result data set 32 Etc. Services used for producing response Provides valuable information about where to inject faults Lineage-driven fault injection (LDFI) [1] Peter Alvaro et al. Lineage-driven fault injection. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15)
  32. Philipp Haller Distributed programming with functional lineages a.k.a. function passing

    New data-centric programming model for functional processing of distributed data. Key ideas: 33 Provide lineages by programming abstractions Keep data stationary (if possible), send functions Utilize lineages for fault injection and recovery
  33. Philipp Haller The function passing model Introducing Consists of 3

    parts: Silos: stationary, typed, immutable data containers SiloRefs: references to local or remote Silos. Spores: safe, serializable functions. 34
  34. Philipp Haller Silos What are they? Silo[T] T SiloRef[T] Two

    parts. def apply def send def persist def unpersist SiloRef. Handle to a Silo. Silo. Typed, stationary data container. User interacts with SiloRef. SiloRefs come with 4 primitive operations. 36
  35. Philipp Haller Silos What are they? Silo[T] T SiloRef[T] Primitive:

    apply Takes a function that is to be applied to the data in the silo associated with the SiloRef. Creates new silo to contain the data that the user- defined function returns; evaluation is deferred def apply[S](fun: T => SiloRef[S]): SiloRef[S] Enables interesting computation DAGs Deferred def apply def send def persist def unpersist 37
  36. Philipp Haller Silos What are they? Silo[T] T SiloRef[T] Primitive:

    send Forces the built-up computation DAG to be sent to the associated node and applied. Future is completed with the result of the computation. def send(): Future[T] EAGER def apply def send def persist def unpersist 38
  37. Philipp Haller Silos Silo[T] T SiloRef[T] Silo factories: Creates silo

    on given host populated with given value/text file/… object SiloRef { def populate[T](host: Host, value: T): SiloRef[T] def fromTextFile(host: Host, file: File): SiloRef[List[String]] ... } def apply def send def persist def unpersist Deferred What are they? 39
  38. Philipp Haller ) Basic idea: apply/send Silo[T] Machine 1 Machine

    2 SiloRef[T] λ T SiloRef[S] S Silo[S] ) T㱺SiloRef[S] 40
  39. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... 41
  40. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) adults 42
  41. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(spore { val localVehicles = vehicles // spore header ps => localVehicles.apply(spore { val localps = ps // spore header vs => SiloRef.populate(currentHost, localps.flatMap(p => // list of (p, v) for a single person p vs.flatMap { v => if (v.owner.name == p.name) List((p, v)) else Nil } ) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 43
  42. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 44
  43. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) sorted labels val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 45
  44. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels so far we just staged computation, we haven’t yet “kicked it off”. val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) 46
  45. Philipp Haller More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s

    make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels λ List[Person]㱺List[String] Silo[List[String]] val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) labels.persist().send() 47
  46. Philipp Haller A functional design for fault-tolerance A SiloRef is

    a lineage, a persistent (in the sense of functional programming) data structure. The lineage is the DAG of operations used to derive the data of a silo. Since the lineage is composed of spores [2], it is serializable. This means it can be persisted or transferred to other machines. Putting lineages to work 48 [2] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP '14
  47. Philipp Haller Next: we formalize lineages, a concept from the

    database + systems communities, in the context of PL. Natural fit in context of functional programming! A functional design for fault-tolerance Putting lineages to work Formalization: typed, distributed core language with spores, silos, and futures. 49
  48. Philipp Haller Properties of function passing model Formalization Subject reduction

    theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. 54 First correctness results for a programming model for lineage-based distributed computation.
  49. Philipp Haller Paper Details, proofs, etc. 56 Haller, Miller, and

    Müller. A Programming Model and Foundation for Lineage-Based Distributed Computation. Journal of Functional Programming. 2018, to appear
 https://infoscience.epfl.ch/record/230304
  50. Philipp Haller Building applications with function passing Built two miniaturized

    example systems inspired by popular big data frameworks. BabySpark MBrace Implemented Spark RDD operators in terms of the primitives of function passing: map, reduce, groupBy, and join Emulated MBrace using the primitives of function passing. (distributed collections) (F# async for distributing tasks) 57 See https://github.com/phaller/f-p/
  51. Philipp Haller Conclusion • Exploring practical deterministic concurrency – For

    an imperative, object-oriented language – Leverage recent advances in type systems • Exploring lineage-based distributed programming – First correctness results for a programming model based on lineages • Finite materialization of distributed, lineage-based data • Goal: high expressivity in distributed setting with shared state and fault tolerance 58