Slide 1

Slide 1 text

Lineages as a first-class construct for fault-tolerant distributed programming Philipp Haller KTH Royal Institute of Technology Stockholm, Sweden Chaos Engineering Day Stockholm, Sweden, December 6th, 2017 1

Slide 2

Slide 2 text

Distributed programming is everywhere! Large-scale web applications, IoT applications, serverless computing, etc. Distribution essential for: Resilience 2 Elasticity (subsumes scalability) Physically distributed systems Availability

Slide 3

Slide 3 text

3 My first steps in distributed programming https://www.lightbend.com/akka-five-year-anniversary Scala Actors used, e.g., in core message queue system of Twitter:

Slide 4

Slide 4 text

Robustness via fault injection testing For each expected system response:
 inject faults which could prevent response Fault: e.g., kill machine 4 Goal: automate selection of faults to inject

Slide 5

Slide 5 text

Example 5 N1 N2 Client N3 N4 BOOM

Slide 6

Slide 6 text

Lineage/provenance Which resources are required for producing a particular expected result? Lineage may record information about: Data sets read/transformed for producing result data set 6 Etc. Services used for producing response Provides valuable information about where to inject faults Lineage-driven fault injection (LDFI) [1] Peter Alvaro, et al. Lineage-driven fault injection. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15)

Slide 7

Slide 7 text

Distributed programming with functional lineages a.k.a. function passing New data-centric programming model for functional processing of distributed data. Key ideas: 7 Provide lineages by programming abstractions Keep data stationary (if possible), send functions Utilize lineages for fault injection and recovery

Slide 8

Slide 8 text

The Function Passing Model Introducing Consists of 3 parts: Silos: stationary, typed, immutable data containers SiloRefs: references to local or remote Silos. Spores: safe, serializable functions. 8

Slide 9

Slide 9 text

The Function Passing Model Some Visual Intuition of Silo SiloRef Master Worker 9

Slide 10

Slide 10 text

Silos What are they? Silo[T] T SiloRef[T] Two parts. def apply def send def persist def unpersist SiloRef. Handle to a Silo. Silo. Typed, stationary data container. User interacts with SiloRef. SiloRefs come with 4 primitive operations. 10

Slide 11

Slide 11 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: apply Takes a function that is to be applied to the data in the silo associated with the SiloRef. Creates new silo to contain the data that the user- defined function returns; evaluation is deferred def apply[S](fun: T => SiloRef[S]): SiloRef[S] Enables interesting computation DAGs Deferred def apply def send def persist def unpersist 11

Slide 12

Slide 12 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: send Forces the built-up computation DAG to be sent to the associated node and applied. Future is completed with the result of the computation. def send(): Future[T] EAGER def apply def send def persist def unpersist 12

Slide 13

Slide 13 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: persist Ensures silo is cached in memory. def persist(): SiloRef[T] def apply def send def persist def unpersist Deferred 13

Slide 14

Slide 14 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: unpersist Enables silo to be removed from memory. def unpersist(): SiloRef[T] def apply def send def persist def unpersist Deferred 14

Slide 15

Slide 15 text

Silos Silo[T] T SiloRef[T] Silo factories: Creates silo on given host containing given value/text file/… object SiloRef { def populate[T](host: Host, value: T): SiloRef[T] def fromTextFile(host: Host, file: File): SiloRef[List[String]] ... } def apply def send def persist def unpersist Deferred What are they? 15

Slide 16

Slide 16 text

) Basic idea: apply/send Silo[T] Machine 1 Machine 2 SiloRef[T] λ T SiloRef[S] S Silo[S] ) T㱺SiloRef[S] 16

Slide 17

Slide 17 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... 17

Slide 18

Slide 18 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) adults 18

Slide 19

Slide 19 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(spore { val localVehicles = vehicles // spore header ps => localVehicles.apply(spore { val localps = ps // spore header vs => SiloRef.populate(currentHost, localps.flatMap(p => // list of (p, v) for a single person p vs.flatMap { v => if (v.owner.name == p.name) List((p, v)) else Nil } ) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 19

Slide 20

Slide 20 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 20

Slide 21

Slide 21 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) sorted labels val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 21

Slide 22

Slide 22 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels so far we just staged computation, we haven’t yet “kicked it off”. val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) 22

Slide 23

Slide 23 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels λ List[Person]㱺List[String] Silo[List[String]] val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) labels.persist().send() 23

Slide 24

Slide 24 text

A functional design for fault-tolerance A SiloRef is a lineage, a persistent (in the sense of functional programming) data structures. The lineage is the DAG of operations used to derive the data of each silo. Since the lineage is composed of spores [2], it is serializable. This means it can be persisted or transferred to other machines. Putting lineages to work 24 [2] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014

Slide 25

Slide 25 text

Next: we formalize lineages, a concept from the database + systems communities, in the context of PL. Natural fit in context of functional programming! A functional design for fault-tolerance Putting lineages to work Formalization: typed, distributed core language with spores, silos, and futures. 25

Slide 26

Slide 26 text

Properties of function passing model Formalization Subject reduction theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. 26 First correctness results for a programming model for lineage-based distributed computation.

Slide 27

Slide 27 text

Building applications with function passing Built two miniaturized example systems inspired by popular big data frameworks. BabySpark MBrace Implemented Spark RDD operators in terms of the primitives of function passing: map, reduce, groupBy, and join Emulated MBrace using the primitives of function passing. (distributed collections) (F# async for distributing tasks) 27 See https://github.com/phaller/f-p/

Slide 28

Slide 28 text

Find out more! References Haller, Miller, and Müller. A Programming Model and Foundation for Lineage-Based Distributed Computation. 2017. Draft: https://infoscience.epfl.ch/record/230304 Miller, Haller, Müller, and Boullier. Function passing: a model for typed, distributed functional programming. Onward! 2016 28

Slide 29

Slide 29 text

Ongoing and future work Integrate function passing and serverless computing Lineage-driven fault-injection for function passing model 29 Lineage-driven fault-injection for serverless computing Composition of serverless functions = serverless function Thank you! Lineages provide • precise fault injection and recovery • provide a design space for perturbation models