Slide 1

Slide 1 text

Taming Concurrent Programming with Domain-Specific Languages Philipp Haller KTH Royal Institute of Technology Stockholm, Sweden 4th ACM SIGPLAN International Workshop on Software Engineering for Parallel Systems (SEPS '17) Vancouver, Canada, October 23rd, 2017 1

Slide 2

Slide 2 text

Isn't Concurrent Programming a Solved Problem? There are promising concurrent programming models and abstractions! • Join-calculus • OpenMP • STM • Async/await • Reactive streams … • Actors • Monitors • Futures, promises • CSP • MPI • Agents 2

Slide 3

Slide 3 text

Why are there so many? How best to model systems? How best to exploit different forms of concurrency and parallelism? Multiple hazards: race conditions, deadlocks, livelocks, etc. Concurrent programming is difficult: Concurrent programming models 3

Slide 4

Slide 4 text

Successes Built-in, lightweight processes (actor model) Erlang without processes: a purely functional language Distributed by design Example: Erlang "Process virtual machine" Monitoring, code hot swapping 4

Slide 5

Slide 5 text

What Next? How to efficiently support multiple forms of concurrency? How to make a variety of programming abstractions fault-tolerant? How to test and verify distributed programs? Great challenges remain, e.g.: 5

Slide 6

Slide 6 text

Domain-specific languages Enabling experimentation Progress on any of these challenges requires exploration and experimentation Implementing new compilers and runtime environments is expensive. Domain-specific languages (DSLs) to the rescue! 6

Slide 7

Slide 7 text

Embedding DSLs Enabling experimentation Embedding in general-purpose languages enables reuse of infrastructure. Deep embedding: stage program written in DSL, analyze and transform staged representation. Shallow embedding: DSL = pure library 7

Slide 8

Slide 8 text

A DSL for Data-Centric Distributed Programming 8

Slide 9

Slide 9 text

High-level picture: wikipedia reduced, 48.4GB 9

Slide 10

Slide 10 text

High-level picture: wikipedia reduced, 48.4GB Chunk up the data… 10

Slide 11

Slide 11 text

High-level picture: Distribute it over your cluster of machines. 11

Slide 12

Slide 12 text

High-level picture: From there, think of your distributed data like a single collection... wiki val wiki: RDD[WikiArticle] = ... wiki.map { article => article.text.toLowerCase } Example: Transform the text of all wiki articles to lowercase. 12

Slide 13

Slide 13 text

Then, why do we build these systems using RPC or message passing? 2) Fault recovery not a natural fit 1) Computational pattern: send functions to data 13

Slide 14

Slide 14 text

Idea: Capitalize on the structure of the problem: Simplifies fault tolerance by design Functional data structure falls out of this 14

Slide 15

Slide 15 text

Distributed Programming with Functional Lineages Key idea: inversion of the actor model. New data-centric programming model for functional processing of distributed data. 15

Slide 16

Slide 16 text

Key idea: Inversion of the actor model. Actors: Encapsulate state and behavior. Are stationary. (References to actors mobile.) Actors exchange data/commands through asynchronous messaging. 16

Slide 17

Slide 17 text

Key idea: Inversion of the actor model. Actors: keep functionality stationary, send data. Functional lineages: keep data stationary, send functionality. (this work!) 17

Slide 18

Slide 18 text

Key idea: Inversion of the actor model. Functional lineages: (this work!) Stateless. Persistent data structures. Keep data stationary. Functions are exchanged through asynchronous messaging. 18

Slide 19

Slide 19 text

The Functional Lineages Model Introducing Consists of 3 parts: Silos: stationary, typed, immutable data containers SiloRefs: references to local or remote Silos. Spores: safe, serializable functions. 19

Slide 20

Slide 20 text

The Functional Lineages Model Some Visual Intuition of Silo SiloRef Master Worker 20

Slide 21

Slide 21 text

Silos What are they? Silo[T] T SiloRef[T] Two parts. def apply def send def persist def unpersist SiloRef. Handle to a Silo. Silo. Typed, stationary data container. User interacts with SiloRef. SiloRefs come with 4 primitive operations. 21

Slide 22

Slide 22 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: apply Takes a function that is to be applied to the data in the silo associated with the SiloRef. Creates new silo to contain the data that the user- defined function returns; evaluation is deferred def apply[S](fun: T => SiloRef[S]): SiloRef[S] Enables interesting computation DAGs Deferred def apply def send def persist def unpersist 22

Slide 23

Slide 23 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: send Forces the built-up computation DAG to be sent to the associated node and applied. Future is completed with the result of the computation. def send(): Future[T] EAGER def apply def send def persist def unpersist 23

Slide 24

Slide 24 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: persist Ensures silo is cached in memory. def persist(): SiloRef[T] def apply def send def persist def unpersist Deferred 24

Slide 25

Slide 25 text

Silos What are they? Silo[T] T SiloRef[T] Primitive: unpersist Enables silo to be removed from memory. def unpersist(): SiloRef[T] def apply def send def persist def unpersist Deferred 25

Slide 26

Slide 26 text

Silos Silo[T] T SiloRef[T] Silo factories: Creates silo on given host containing given value/text file/… object SiloRef { def populate[T](host: Host, value: T): SiloRef[T] def fromTextFile(host: Host, file: File): SiloRef[List[String]] ... } def apply def send def persist def unpersist Deferred What are they? 26

Slide 27

Slide 27 text

) Basic idea: apply/send Silo[T] Machine 1 Machine 2 SiloRef[T] λ T SiloRef[S] S Silo[S] ) T㱺SiloRef[S] 27

Slide 28

Slide 28 text

The Problem with Closures Distributing Functions class MyCoolApp { val param = 42 val log = new Log(...) ... def work(silo: SiloRef[Int]) = { silo.apply(x => SiloRef.populate(currentHost, x + param) ).send() } } 28

Slide 29

Slide 29 text

The Problem with Closures Distributing Functions class MyCoolApp { val param = 42 val log = new Log(...) ... def work(silo: SiloRef[Int]) = { silo.apply(x => SiloRef.populate(currentHost, x + this.param) ).send() } } Accidental capture of a non-serializable object. 29

Slide 30

Slide 30 text

The Problem with Closures Distributing Functions class MyCoolApp { val param = 42 val log = new Log(...) ... def work(silo: SiloRef[Int]) = { silo.apply(x => SiloRef.populate(currentHost, x + this.param) ).send() } } Accidental capture of a non-serializable object. 30

Slide 31

Slide 31 text

The Problem with Closures: Solution Distributing Functions class MyCoolApp { val param = 42 val log = new Log(...) ... def work(silo: SiloRef[Int]) = { silo.apply(spore { val localParam = this.param x => SiloRef.populate(currentHost, x + localParam) }).send() } } Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 Spore header Spore body 31

Slide 32

Slide 32 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... 32

Slide 33

Slide 33 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) adults 33

Slide 34

Slide 34 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(spore { val localVehicles = vehicles // spore header ps => localVehicles.apply(spore { val localps = ps // spore header vs => SiloRef.populate(currentHost, localps.flatMap(p => // list of (p, v) for a single person p vs.flatMap { v => if (v.owner.name == p.name) List((p, v)) else Nil } ) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 34

Slide 35

Slide 35 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 35

Slide 36

Slide 36 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) sorted labels val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 36

Slide 37

Slide 37 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels so far we just staged computation, we haven’t yet “kicked it off”. val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) 37

Slide 38

Slide 38 text

More involved example Silo[List[Person]] Machine 1 SiloRef[List[Person]] Let’s make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels λ List[Person]㱺List[String] Silo[List[String]] val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) labels.persist().send() 38

Slide 39

Slide 39 text

A functional design for fault-tolerance Silos and SiloRefs relate to each other by means of lineages, persistent data structures. The lineage is based on the DAG of operations to derive the data of each silo. Since the lineage is composed of spores, it is serializable. This means it can be persisted or transferred to other machines. Putting lineages to work 39

Slide 40

Slide 40 text

Next: we formalize lineages, a concept from the database + systems communities, in the context of PL. Natural fit in context of functional programming! Intuition: Spores & SiloRefs are safe to serialize. Therefore, we can save entire DAGs, share them, and use them to restart computations. A functional design for fault-tolerance Putting lineages to work Formalization: typed, distributed core language with spores, silos, and futures. 40

Slide 41

Slide 41 text

Properties of Functional Lineages Formalization Subject reduction theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. First correctness results for a programming model for lineage-based distributed computation. 41

Slide 42

Slide 42 text

Building Applications with Functional Lineages Built two miniaturized example systems inspired by popular big data frameworks. Apache Spark MBrace Implemented Spark RDD operators in terms of the primitives of Functional Lineages: map, reduce, groupBy, and join Emulated MBrace using the primitives of Functional Lineages. (distributed collections) (F# async for distributing tasks) See https://github.com/heathermiller/f-p 42

Slide 43

Slide 43 text

Revisiting safety Preventing unsafe state Accessing global, mutable state from within silos is undefined and meaningless. Additional static checking required to prevent undefined accesses. Proposal: objects put into silos must conform to the object capability model. Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, 2006 43

Slide 44

Slide 44 text

Object capability model in Scala Empirical results LaCasa: Scala extension implementing the object capability model Empirical results show: many existing class definitions conform to the object capability model. Project #classes/traits #ocap (%) #dir. insec. (%) Scala stdlib 1,505 644 (43%) 212/861 (25%) Signal/Collect 236 159 (67%) 60/77 (78%) GeoTrellis -engine 190 40 (21%) 124/150 (83%) -raster 670 233 (35%) 325/437 (74%) -spark 326 101 (31%) 167/225 (74%) Total 2,927 1,177 (40%) 888/1,750 (51%) 44

Slide 45

Slide 45 text

Limitations of Embedded DSLs DSL definition: Control flow (e.g., no first-class continuations) Static checking (e.g., no type system extensions) DSL implementation: Same runtime environment as host language DSLs realized as shallow embeddings limit: "The Next 700 Asynchronous Programming Models", ACM SPLASH-I 2013
 https://www.infoq.com/presentations/rx-async 45

Slide 46

Slide 46 text

Addressing DSL limitations Improving DSL embedding Experimentation with variety of DSLs helps identify limitations and discovery of constructs applicable to multiple DSLs. Shown language extensions: Spores: safe, serializable closures Possible approach: extend general-purpose programming languages to improve embedding of concurrency DSLs. Object-capability security 46

Slide 47

Slide 47 text

Find out more! References Spores: safe, serializable closures Functional Lineages: Object capabilities and affine types in Scala: Haller, Miller, and Müller. A Programming Model and Foundation for Lineage-Based Distributed Computation. 2017. Draft: https://infoscience.epfl.ch/record/230304 Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 Miller, Haller, Müller, and Boullier. Function passing: a model for typed, distributed functional programming. Onward! 2016 Haller and Loiko. LaCasa: lightweight affinity and object capabilities in Scala. OOPSLA 2016 47

Slide 48

Slide 48 text

Conclusions Lessons learnt Programming languages help simplify concurrency and distribution. DSLs enable experimenting with new programming models and constructs. Language extensions increase expressiveness and safety of DSLs. Extensions should not be DSL-specific. We have seen this at work in an embedded DSL implementing Functional Lineages. 48