Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lineages as a first-class construct for fault-tolerant distributed programming

Philipp Haller
December 06, 2017
250

Lineages as a first-class construct for fault-tolerant distributed programming

Philipp Haller

December 06, 2017
Tweet

Transcript

  1. Lineages as a first-class construct
    for fault-tolerant distributed
    programming
    Philipp Haller
    KTH Royal Institute of Technology
    Stockholm, Sweden
    Chaos Engineering Day
    Stockholm, Sweden, December 6th, 2017
    1

    View Slide

  2. Distributed programming is everywhere!
    Large-scale web applications, IoT applications,
    serverless computing, etc.
    Distribution essential for:
    Resilience
    2
    Elasticity (subsumes scalability)
    Physically distributed systems
    Availability

    View Slide

  3. 3
    My first steps in distributed programming
    https://www.lightbend.com/akka-five-year-anniversary
    Scala Actors used, e.g.,
    in core message queue
    system of Twitter:

    View Slide

  4. Robustness via fault injection testing
    For each expected system response:

    inject faults which could prevent response
    Fault: e.g., kill machine
    4
    Goal: automate selection of faults to inject

    View Slide

  5. Example
    5
    N1
    N2
    Client
    N3 N4
    BOOM

    View Slide

  6. Lineage/provenance
    Which resources are required for producing a
    particular expected result?
    Lineage may record information about:
    Data sets read/transformed for producing result data set
    6
    Etc.
    Services used for producing response
    Provides valuable information about
    where to inject faults
    Lineage-driven
    fault injection (LDFI)
    [1] Peter Alvaro, et al. Lineage-driven fault injection. In Proceedings of the 2015
    ACM SIGMOD International Conference on Management of Data (SIGMOD '15)

    View Slide

  7. Distributed programming with functional
    lineages a.k.a. function passing
    New data-centric programming model for functional
    processing of distributed data.
    Key ideas:
    7
    Provide lineages by programming abstractions
    Keep data stationary (if possible), send functions
    Utilize lineages for fault injection and recovery

    View Slide

  8. The Function Passing Model
    Introducing
    Consists of 3 parts:
    Silos: stationary, typed, immutable data
    containers
    SiloRefs: references to local or remote Silos.
    Spores: safe, serializable functions.
    8

    View Slide

  9. The Function Passing Model
    Some Visual Intuition of
    Silo SiloRef
    Master
    Worker
    9

    View Slide

  10. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Two parts.
    def apply
    def send
    def persist
    def unpersist
    SiloRef. Handle to a Silo.
    Silo. Typed, stationary data container.
    User interacts with SiloRef.
    SiloRefs come with 4 primitive operations.
    10

    View Slide

  11. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: apply
    Takes a function that is to be applied to the data in the
    silo associated with the SiloRef.
    Creates new silo to contain the data that the user-
    defined function returns; evaluation is deferred
    def apply[S](fun: T => SiloRef[S]): SiloRef[S]
    Enables interesting computation DAGs
    Deferred
    def apply
    def send
    def persist
    def unpersist
    11

    View Slide

  12. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: send
    Forces the built-up computation DAG to be sent to the
    associated node and applied.
    Future is completed with the result of the computation.
    def send(): Future[T]
    EAGER
    def apply
    def send
    def persist
    def unpersist
    12

    View Slide

  13. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: persist
    Ensures silo is cached in memory.
    def persist(): SiloRef[T]
    def apply
    def send
    def persist
    def unpersist
    Deferred
    13

    View Slide

  14. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: unpersist
    Enables silo to be removed from memory.
    def unpersist(): SiloRef[T]
    def apply
    def send
    def persist
    def unpersist
    Deferred
    14

    View Slide

  15. Silos
    Silo[T]
    T
    SiloRef[T]
    Silo factories:
    Creates silo on given host containing given value/text file/…
    object SiloRef {
    def populate[T](host: Host, value: T): SiloRef[T]
    def fromTextFile(host: Host, file: File): SiloRef[List[String]]
    ...
    }
    def apply
    def send
    def persist
    def unpersist
    Deferred
    What are they?
    15

    View Slide

  16. )
    Basic idea: apply/send
    Silo[T]
    Machine 1 Machine 2
    SiloRef[T]
    λ
    T
    SiloRef[S]
    S
    Silo[S]
    )
    T㱺SiloRef[S]
    16

    View Slide

  17. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    17

    View Slide

  18. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    adults
    18

    View Slide

  19. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(spore {
    val localVehicles = vehicles // spore header
    ps =>
    localVehicles.apply(spore {
    val localps = ps // spore header
    vs =>
    SiloRef.populate(currentHost,
    localps.flatMap(p =>
    // list of (p, v) for a single person p
    vs.flatMap {
    v =>
    if (v.owner.name == p.name) List((p, v))
    else Nil
    }
    )
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    19

    View Slide

  20. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    20

    View Slide

  21. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    sorted
    labels
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    21

    View Slide

  22. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels
    so far we just staged
    computation, we haven’t yet
    “kicked it off”.
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    22

    View Slide

  23. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels λ
    List[Person]㱺List[String]
    Silo[List[String]]
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    labels.persist().send()
    23

    View Slide

  24. A functional design for fault-tolerance
    A SiloRef is a lineage, a persistent (in the sense
    of functional programming) data structures.
    The lineage is the DAG of operations used to
    derive the data of each silo.
    Since the lineage is composed of spores [2], it is
    serializable. This means it can be persisted or
    transferred to other machines.
    Putting lineages to work
    24
    [2] Miller, Haller, and Odersky. Spores: a type-based foundation for closures
    in the age of concurrency and distribution. ECOOP 2014

    View Slide

  25. Next: we formalize lineages, a concept from the
    database + systems communities, in the context of
    PL. Natural fit in context of functional programming!
    A functional design for fault-tolerance
    Putting lineages to work
    Formalization: typed, distributed core
    language with spores, silos, and futures.
    25

    View Slide

  26. Properties of function passing model
    Formalization
    Subject reduction theorem guarantees
    preservation of types under reduction, as well as
    preservation of lineage mobility.
    Progress theorem guarantees the finite
    materialization of remote, lineage-based data.
    26
    First correctness results for a programming model
    for lineage-based distributed computation.

    View Slide

  27. Building applications with function passing
    Built two miniaturized example systems
    inspired by popular big data frameworks.
    BabySpark
    MBrace
    Implemented Spark RDD operators in terms of
    the primitives of function passing:
    map, reduce, groupBy, and join
    Emulated MBrace using the primitives of
    function passing.
    (distributed collections)
    (F# async for distributing tasks)
    27
    See https://github.com/phaller/f-p/

    View Slide

  28. Find out more!
    References
    Haller, Miller, and Müller. A Programming Model and Foundation for Lineage-Based
    Distributed Computation. 2017. Draft: https://infoscience.epfl.ch/record/230304
    Miller, Haller, Müller, and Boullier. Function passing: a model for typed,
    distributed functional programming. Onward! 2016
    28

    View Slide

  29. Ongoing and future work
    Integrate function passing and serverless computing
    Lineage-driven fault-injection for function passing model
    29
    Lineage-driven fault-injection for serverless computing
    Composition of serverless functions = serverless function
    Thank you!
    Lineages provide
    • precise fault injection and recovery
    • provide a design space for perturbation models

    View Slide