Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Taming Concurrent Programming with Domain-Specific Languages

Philipp Haller
October 23, 2017
340

Taming Concurrent Programming with Domain-Specific Languages

Philipp Haller

October 23, 2017
Tweet

Transcript

  1. Taming Concurrent Programming
    with Domain-Specific Languages
    Philipp Haller
    KTH Royal Institute of Technology
    Stockholm, Sweden
    4th ACM SIGPLAN International Workshop on
    Software Engineering for Parallel Systems (SEPS '17)
    Vancouver, Canada, October 23rd, 2017
    1

    View Slide

  2. Isn't Concurrent Programming a Solved Problem?
    There are promising concurrent programming
    models and abstractions!
    • Join-calculus
    • OpenMP
    • STM
    • Async/await
    • Reactive streams

    • Actors
    • Monitors
    • Futures, promises
    • CSP
    • MPI
    • Agents
    2

    View Slide

  3. Why are there so many?
    How best to model systems?
    How best to exploit different forms of
    concurrency and parallelism?
    Multiple hazards: race conditions, deadlocks,
    livelocks, etc.
    Concurrent programming is difficult:
    Concurrent programming models
    3

    View Slide

  4. Successes
    Built-in, lightweight processes (actor model)
    Erlang without processes: a purely functional language
    Distributed by design
    Example: Erlang
    "Process virtual machine"
    Monitoring, code
    hot swapping
    4

    View Slide

  5. What Next?
    How to efficiently support multiple forms of
    concurrency?
    How to make a variety of programming
    abstractions fault-tolerant?
    How to test and verify distributed programs?
    Great challenges remain, e.g.:
    5

    View Slide

  6. Domain-specific languages
    Enabling experimentation
    Progress on any of these challenges requires
    exploration and experimentation
    Implementing new compilers and runtime
    environments is expensive.
    Domain-specific languages (DSLs) to the rescue!
    6

    View Slide

  7. Embedding DSLs
    Enabling experimentation
    Embedding in general-purpose languages
    enables reuse of infrastructure.
    Deep embedding: stage program written in DSL,
    analyze and transform staged representation.
    Shallow embedding: DSL = pure library
    7

    View Slide

  8. A DSL for Data-Centric Distributed
    Programming
    8

    View Slide

  9. High-level picture:
    wikipedia
    reduced, 48.4GB
    9

    View Slide

  10. High-level picture:
    wikipedia
    reduced, 48.4GB
    Chunk up the data…
    10

    View Slide

  11. High-level picture:
    Distribute it over your
    cluster of machines.
    11

    View Slide

  12. High-level picture:
    From there, think of your distributed data like a
    single collection...
    wiki
    val wiki: RDD[WikiArticle] = ...
    wiki.map { article =>
    article.text.toLowerCase
    }
    Example:
    Transform the text of all wiki
    articles to lowercase.
    12

    View Slide

  13. Then, why do we build these systems
    using RPC or message passing?
    2) Fault recovery not a natural fit
    1) Computational pattern: send functions to data
    13

    View Slide

  14. Idea:
    Capitalize on the structure of the problem:
    Simplifies fault tolerance by design
    Functional data structure falls out of this
    14

    View Slide

  15. Distributed Programming with
    Functional Lineages
    Key idea: inversion of the actor model.
    New data-centric programming model for functional
    processing of distributed data.
    15

    View Slide

  16. Key idea:
    Inversion of the actor model.
    Actors:
    Encapsulate state and behavior.
    Are stationary. (References to actors mobile.)
    Actors exchange data/commands
    through asynchronous messaging.
    16

    View Slide

  17. Key idea:
    Inversion of the actor model.
    Actors:
    keep functionality stationary, send data.
    Functional lineages:
    keep data stationary, send functionality.
    (this work!)
    17

    View Slide

  18. Key idea:
    Inversion of the actor model.
    Functional lineages: (this work!)
    Stateless. Persistent data structures.
    Keep data stationary.
    Functions are exchanged through
    asynchronous messaging.
    18

    View Slide

  19. The Functional Lineages Model
    Introducing
    Consists of 3 parts:
    Silos: stationary, typed, immutable data
    containers
    SiloRefs: references to local or remote
    Silos.
    Spores: safe, serializable functions.
    19

    View Slide

  20. The Functional Lineages Model
    Some Visual Intuition of
    Silo SiloRef
    Master
    Worker
    20

    View Slide

  21. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Two parts.
    def apply
    def send
    def persist
    def unpersist
    SiloRef. Handle to a Silo.
    Silo. Typed, stationary data container.
    User interacts with SiloRef.
    SiloRefs come with 4 primitive operations.
    21

    View Slide

  22. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: apply
    Takes a function that is to be applied to the data in the
    silo associated with the SiloRef.
    Creates new silo to contain the data that the user-
    defined function returns; evaluation is deferred
    def apply[S](fun: T => SiloRef[S]): SiloRef[S]
    Enables interesting computation DAGs
    Deferred
    def apply
    def send
    def persist
    def unpersist
    22

    View Slide

  23. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: send
    Forces the built-up computation DAG to be sent to the
    associated node and applied.
    Future is completed with the result of the computation.
    def send(): Future[T]
    EAGER
    def apply
    def send
    def persist
    def unpersist
    23

    View Slide

  24. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: persist
    Ensures silo is cached in memory.
    def persist(): SiloRef[T]
    def apply
    def send
    def persist
    def unpersist
    Deferred
    24

    View Slide

  25. Silos
    What are they?
    Silo[T]
    T
    SiloRef[T]
    Primitive: unpersist
    Enables silo to be removed from memory.
    def unpersist(): SiloRef[T]
    def apply
    def send
    def persist
    def unpersist
    Deferred
    25

    View Slide

  26. Silos
    Silo[T]
    T
    SiloRef[T]
    Silo factories:
    Creates silo on given host containing given value/text file/…
    object SiloRef {
    def populate[T](host: Host, value: T): SiloRef[T]
    def fromTextFile(host: Host, file: File): SiloRef[List[String]]
    ...
    }
    def apply
    def send
    def persist
    def unpersist
    Deferred
    What are they?
    26

    View Slide

  27. )
    Basic idea: apply/send
    Silo[T]
    Machine 1 Machine 2
    SiloRef[T]
    λ
    T
    SiloRef[S]
    S
    Silo[S]
    )
    T㱺SiloRef[S]
    27

    View Slide

  28. The Problem with Closures
    Distributing Functions
    class MyCoolApp {
    val param = 42
    val log = new Log(...)
    ...
    def work(silo: SiloRef[Int]) = {
    silo.apply(x =>
    SiloRef.populate(currentHost, x + param)
    ).send()
    }
    }
    28

    View Slide

  29. The Problem with Closures
    Distributing Functions
    class MyCoolApp {
    val param = 42
    val log = new Log(...)
    ...
    def work(silo: SiloRef[Int]) = {
    silo.apply(x =>
    SiloRef.populate(currentHost, x + this.param)
    ).send()
    }
    }
    Accidental capture of a non-serializable object.
    29

    View Slide

  30. The Problem with Closures
    Distributing Functions
    class MyCoolApp {
    val param = 42
    val log = new Log(...)
    ...
    def work(silo: SiloRef[Int]) = {
    silo.apply(x =>
    SiloRef.populate(currentHost, x + this.param)
    ).send()
    }
    }
    Accidental capture of a non-serializable object.
    30

    View Slide

  31. The Problem with Closures: Solution
    Distributing Functions
    class MyCoolApp {
    val param = 42
    val log = new Log(...)
    ...
    def work(silo: SiloRef[Int]) = {
    silo.apply(spore {
    val localParam = this.param
    x =>
    SiloRef.populate(currentHost, x + localParam)
    }).send()
    }
    }
    Miller, Haller, and Odersky. Spores: a type-based foundation for closures
    in the age of concurrency and distribution. ECOOP 2014
    Spore header Spore body
    31

    View Slide

  32. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    32

    View Slide

  33. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    adults
    33

    View Slide

  34. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(spore {
    val localVehicles = vehicles // spore header
    ps =>
    localVehicles.apply(spore {
    val localps = ps // spore header
    vs =>
    SiloRef.populate(currentHost,
    localps.flatMap(p =>
    // list of (p, v) for a single person p
    vs.flatMap {
    v =>
    if (v.owner.name == p.name) List((p, v))
    else Nil
    }
    )
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    34

    View Slide

  35. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    35

    View Slide

  36. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    sorted
    labels
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    36

    View Slide

  37. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels
    so far we just staged
    computation, we haven’t yet
    “kicked it off”.
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    37

    View Slide

  38. More involved example
    Silo[List[Person]]
    Machine 1
    SiloRef[List[Person]]
    Let’s make an interesting DAG!
    Machine 2
    persons:
    val persons: SiloRef[List[Person]] = ...
    val vehicles: SiloRef[List[Vehicle]] = ...
    // adults that own a vehicle
    val owners = adults.apply(...)
    adults
    owners
    vehicles
    sorted
    labels λ
    List[Person]㱺List[String]
    Silo[List[String]]
    val adults =
    persons.apply(spore { ps =>
    val res = ps.filter(p => p.age >= 18)
    SiloRef.populate(currentHost, res)
    })
    val sorted =
    adults.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.sortWith(p => p.age))
    })
    val labels =
    sorted.apply(spore { ps =>
    SiloRef.populate(currentHost,
    ps.map(p => "Hi, " + p.name))
    })
    labels.persist().send()
    38

    View Slide

  39. A functional design for fault-tolerance
    Silos and SiloRefs relate to each other by means
    of lineages, persistent data structures.
    The lineage is based on the DAG of operations to
    derive the data of each silo.
    Since the lineage is composed of spores, it is
    serializable. This means it can be persisted or
    transferred to other machines.
    Putting lineages to work
    39

    View Slide

  40. Next: we formalize lineages, a concept from the
    database + systems communities, in the context of
    PL. Natural fit in context of functional programming!
    Intuition: Spores & SiloRefs are safe to serialize.
    Therefore, we can save entire DAGs, share them, and
    use them to restart computations.
    A functional design for fault-tolerance
    Putting lineages to work
    Formalization: typed, distributed core
    language with spores, silos, and futures.
    40

    View Slide

  41. Properties of Functional Lineages
    Formalization
    Subject reduction theorem guarantees
    preservation of types under reduction, as well as
    preservation of lineage mobility.
    Progress theorem guarantees the finite
    materialization of remote, lineage-based data.
    First correctness results for a programming model
    for lineage-based distributed computation.
    41

    View Slide

  42. Building Applications with Functional Lineages
    Built two miniaturized example systems
    inspired by popular big data frameworks.
    Apache Spark
    MBrace
    Implemented Spark RDD operators in terms of
    the primitives of Functional Lineages:
    map, reduce, groupBy, and join
    Emulated MBrace using the primitives of
    Functional Lineages.
    (distributed collections)
    (F# async for distributing tasks)
    See https://github.com/heathermiller/f-p
    42

    View Slide

  43. Revisiting safety
    Preventing unsafe state
    Accessing global, mutable state from within silos
    is undefined and meaningless.
    Additional static checking required to prevent
    undefined accesses.
    Proposal: objects put into silos must conform to the
    object capability model.
    Mark S. Miller. Robust Composition: Towards a Unified Approach to
    Access Control and Concurrency Control. PhD thesis, 2006
    43

    View Slide

  44. Object capability model in Scala
    Empirical results
    LaCasa: Scala extension implementing the object capability model
    Empirical results show: many existing class definitions conform to
    the object capability model.
    Project #classes/traits #ocap (%) #dir. insec. (%)
    Scala stdlib 1,505 644 (43%) 212/861 (25%)
    Signal/Collect 236 159 (67%) 60/77 (78%)
    GeoTrellis
    -engine 190 40 (21%) 124/150 (83%)
    -raster 670 233 (35%) 325/437 (74%)
    -spark 326 101 (31%) 167/225 (74%)
    Total 2,927 1,177 (40%) 888/1,750 (51%)
    44

    View Slide

  45. Limitations of Embedded DSLs
    DSL definition:
    Control flow (e.g., no first-class continuations)
    Static checking (e.g., no type system extensions)
    DSL implementation:
    Same runtime environment as host language
    DSLs realized as shallow embeddings limit:
    "The Next 700 Asynchronous Programming Models", ACM SPLASH-I 2013

    https://www.infoq.com/presentations/rx-async
    45

    View Slide

  46. Addressing DSL limitations
    Improving DSL embedding
    Experimentation with variety of DSLs helps
    identify limitations and discovery of constructs
    applicable to multiple DSLs.
    Shown language extensions:
    Spores: safe, serializable closures
    Possible approach: extend general-purpose programming
    languages to improve embedding of concurrency DSLs.
    Object-capability security
    46

    View Slide

  47. Find out more!
    References
    Spores: safe, serializable closures
    Functional Lineages:
    Object capabilities and affine types in Scala:
    Haller, Miller, and Müller. A Programming Model and Foundation for Lineage-Based
    Distributed Computation. 2017. Draft: https://infoscience.epfl.ch/record/230304
    Miller, Haller, and Odersky. Spores: a type-based foundation for closures
    in the age of concurrency and distribution. ECOOP 2014
    Miller, Haller, Müller, and Boullier. Function passing: a model for typed,
    distributed functional programming. Onward! 2016
    Haller and Loiko. LaCasa: lightweight affinity and object capabilities in
    Scala. OOPSLA 2016
    47

    View Slide

  48. Conclusions
    Lessons learnt
    Programming languages help simplify
    concurrency and distribution.
    DSLs enable experimenting with new
    programming models and constructs.
    Language extensions increase expressiveness and
    safety of DSLs. Extensions should not be DSL-specific.
    We have seen this at work in an embedded DSL
    implementing Functional Lineages.
    48

    View Slide