• Example 1: 3 @volatile var x = 0 def m(): Unit = { Future { x = 1 } Future { x = 2 } .. // does not access x } What’s the value of x when an invocation of m returns?
– Lattice-based datatypes – Quiescence – Resolution of cyclic dependencies • Extend Scala's type system for static safety 7 Crucial for determinism! Increases expressivity!
abstractions: cells and cell completers – Cell = shared “variable” – Cell[K,V]: read-only interface; read values of type V – CellCompleter[K,V]: write values of type V to its associated cell – V must have an instance of a lattice type class 8 Monotonic updates
object IntSetLattice extends Lattice[Set[Int]] { val empty = Set() def join(left: Set[Int], right: Set[Int]) = left ++ right } // add a user ID userIDs.putNext(Set(theUserID)) val userIDs = CellCompleter[Set[Int]] Bounded join-semilattice [1] Oliveira, Moors, and Odersky. Type classes as objects and implicits. OOPSLA 2010
value, how do we know this value is not going to change any more? – There may still be ongoing concurrent activities – Manual synchronization (e.g., latches) error-prone 11
Attempting to mutate a frozen cell results in a failure • May only read from frozen cells – Ensures only unchangeable values are read – Weakens determinism guarantee 12 "All non-failing executions compute the same cell values." "Quasi- determinism"
a cell’s value, how do we know this value is not going to change any more? – There may still be ongoing concurrent activities • Alternative solution: Quiescence • Stronger than quasi-determinism 13
are guaranteed not to change any more • Technically: – No concurrent activities ongoing or scheduled which could change values of cells – Detected by the underlying thread pool 14
ID userIDs.putNext(Set(theUserID)) .. val pool = new HandlerPool val userIDs = CellCompleter[Set[Int]](pool) // register handler // upon quiescence: read result value of cell pool.onQuiescent(userIDs.cell) { collectedIDs => .. } Safe to read from cell when pool quiescent!
cell "crosses" a threshold set: update another cell – "Crosses" = new value greater than one of the values in the threshold set 16 cell2.whenNext(cell1, Set(v1, v2, v3)) { v => // compute update for cell2 } * non-reactive threshold reads: [2] Kuper, Turon, Krishnaswami, and Newton. Freeze after writing: Quasi-deterministic parallel programming with LVars. POPL 2014
threshold set • Restriction: Incompatibility of elements of threshold set – v1, v2 incompatible iff LUB(v1, v2) = Top • Concurrent crossings of threshold set due to different elements yield failed executions – Turn potential non-determinism into failure, thus preserving quasi- determinism 18
capture by closures passed to whenNext 19 var x = 0 cell2.whenNext(cell1, Set(1)) { v => NextOutcome(x) } cell3.whenNext(cell1, Set(1)) { v => x = 1 NoOutcome } cell1.putNext(1) [3] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 Solution: use spores [2] to prevent • re-assigning captured variables • capturing mutable, shared data structures
{ def set(x: Int): Unit = Global.f = x def get: Int = Global.f } object Global { var f: Int = 0 } cell2.whenNext(cell1, Set(1)) { v => val c = new C NextOutcome(c.get) } cell3.whenNext(cell1, Set(1)) { v => val c = new C c.set(1) NoOutcome } cell1.putNext(1)
classes conforming to the object capability model [3] • A class is conformant* ("ocap") iff – its methods only access parameters and this – its methods only instantiate ocap classes – types of fields and method parameters are ocap 21 * simplified [4] Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, 2006
– Extension of imperative, object-oriented base language – Lattices and quiescence for determinism – Resolution of cyclic dependencies – Type system for object capabilities for safety • First experimental results • Ongoing and future work: – Complete formal development – Implement state-of-the-art static analyses 30
particular expected result? Lineage may record information about: Data sets read/transformed for producing result data set 32 Etc. Services used for producing response Provides valuable information about where to inject faults Lineage-driven fault injection (LDFI) [1] Peter Alvaro et al. Lineage-driven fault injection. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15)
New data-centric programming model for functional processing of distributed data. Key ideas: 33 Provide lineages by programming abstractions Keep data stationary (if possible), send functions Utilize lineages for fault injection and recovery
parts. def apply def send def persist def unpersist SiloRef. Handle to a Silo. Silo. Typed, stationary data container. User interacts with SiloRef. SiloRefs come with 4 primitive operations. 36
apply Takes a function that is to be applied to the data in the silo associated with the SiloRef. Creates new silo to contain the data that the user- defined function returns; evaluation is deferred def apply[S](fun: T => SiloRef[S]): SiloRef[S] Enables interesting computation DAGs Deferred def apply def send def persist def unpersist 37
send Forces the built-up computation DAG to be sent to the associated node and applied. Future is completed with the result of the computation. def send(): Future[T] EAGER def apply def send def persist def unpersist 38
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(spore { val localVehicles = vehicles // spore header ps => localVehicles.apply(spore { val localps = ps // spore header vs => SiloRef.populate(currentHost, localps.flatMap(p => // list of (p, v) for a single person p vs.flatMap { v => if (v.owner.name == p.name) List((p, v)) else Nil } ) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 43
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 44
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels so far we just staged computation, we haven’t yet “kicked it off”. val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) 46
a lineage, a persistent (in the sense of functional programming) data structure. The lineage is the DAG of operations used to derive the data of a silo. Since the lineage is composed of spores [2], it is serializable. This means it can be persisted or transferred to other machines. Putting lineages to work 48 [2] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP '14
database + systems communities, in the context of PL. Natural fit in context of functional programming! A functional design for fault-tolerance Putting lineages to work Formalization: typed, distributed core language with spores, silos, and futures. 49
theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. 54 First correctness results for a programming model for lineage-based distributed computation.
Müller. A Programming Model and Foundation for Lineage-Based Distributed Computation. Journal of Functional Programming. 2018, to appear https://infoscience.epfl.ch/record/230304
example systems inspired by popular big data frameworks. BabySpark MBrace Implemented Spark RDD operators in terms of the primitives of function passing: map, reduce, groupBy, and join Emulated MBrace using the primitives of function passing. (distributed collections) (F# async for distributing tasks) 57 See https://github.com/phaller/f-p/
an imperative, object-oriented language – Leverage recent advances in type systems • Exploring lineage-based distributed programming – First correctness results for a programming model based on lineages • Finite materialization of distributed, lineage-based data • Goal: high expressivity in distributed setting with shared state and fault tolerance 58