• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] • Reacting at the speed of the environment (guaranteed timely responses) – Example: autonomous driving • High availability • Fault tolerance 2
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] 4 February 2018
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] 4 February 2018 Q4, 2017
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] • Reacting at the speed of the environment (guaranteed timely responses) 4 February 2018 Q4, 2017
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] • Reacting at the speed of the environment (guaranteed timely responses) – Example: autonomous driving 4 February 2018 Q4, 2017
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] • Reacting at the speed of the environment (guaranteed timely responses) – Example: autonomous driving • High availability 4 February 2018 Q4, 2017
• Rapidly increasing scale of workloads: – CERN amassed about 200 PB of data from over 800 trillion collisions searching for the Higgs boson. [1] – Steam, a digital content distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [2] – Twitter has about 330 million monthly active users [3] • Reacting at the speed of the environment (guaranteed timely responses) – Example: autonomous driving • High availability • Fault tolerance 4 February 2018 Q4, 2017
the complex interplay of: – concurrency of computations – asynchronicity of events – failure of communication and/or systems • An extreme challenge even for expert programmers 6
for data-race safe concurrency • Part 2: Practical deterministic concurrency • Part 3: Lineage-based distributed programming • Ongoing and future work • Conclusion 7
for data-race safe concurrency • Part 2: Practical deterministic concurrency • Part 3: Lineage-based distributed programming • Ongoing and future work • Conclusion 8
a data race? • A data race occurs – when two tasks (threads, processes, actors) concurrently access the same shared variable (or object field) and – at least one of the accesses is a write (an assignment) 9
a data race? • A data race occurs – when two tasks (threads, processes, actors) concurrently access the same shared variable (or object field) and – at least one of the accesses is a write (an assignment) • In practice, data races are difficult to find and fix 9
a data race? • A data race occurs – when two tasks (threads, processes, actors) concurrently access the same shared variable (or object field) and – at least one of the accesses is a write (an assignment) • In practice, data races are difficult to find and fix • They can have dramatic consequences… 9
power outage throughout parts of the Northeastern and Midwestern US and the Canadian province of Ontario on August 14, 2003 Primary cause: a data-race bug in the alarm system at the control room of FirstEnergy Corporation
safety for their provided or enabled concurrency abstractions 11 IEEE Spectrum ranking "Top Programming Languages 2018" ("Trending" preset) https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2018
a lightweight type system • that minimizes the effort to reuse existing code Focus: • Existing, full-featured languages like Scala 12 In contrast to new language designs like Rust
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) 13
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) • Challenges: 13
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) • Challenges: – Sound integration with advanced type system features 13
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) • Challenges: – Sound integration with advanced type system features 13 Example: local type inference
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) • Challenges: – Sound integration with advanced type system features – Adoption on large scale 13 Example: local type inference
progress in type systems for safe concurrency (linear and affine types, static capabilities, uniqueness types, ownership types, region inference, etc.) • Challenges: – Sound integration with advanced type system features – Adoption on large scale • Key: reuse of existing code 13 Example: local type inference
– Image data large • Approach for high performance: – Each pipeline stage is a concurrent actor – In-place update of image buffers – Pass mutable buffers by reference between actors 15
Stage 1 sends a reference to a buffer to stage 2 2. Following the send, both stages have a reference to the same buffer 3. Stages can concurrently access the buffer 16
with affine types and object capabilities – Affine types: • Variables of affine type may be "used" at most once • "Used" = consumed • A consumed variable cannot be accessed any more 17
with affine types and object capabilities – Affine types: • Variables of affine type may be "used" at most once • "Used" = consumed • A consumed variable cannot be accessed any more – Values of affine type are called permissions in our system 17
with affine types and object capabilities – Affine types: • Variables of affine type may be "used" at most once • "Used" = consumed • A consumed variable cannot be accessed any more – Values of affine type are called permissions in our system – Permissions control access to transferable objects 17
and Object Capabilities Transferable objects: instances of a new generic type Box[T] 19 def receive(box: Box[Message]): Unit = { box open { msg => msg.buffer = Array(1, 2, 3, 4) } ... } class Message { var buffer: Array[Byte] = _ }
and Object Capabilities Transferable objects: instances of a new generic type Box[T] 19 def receive(box: Box[Message]): Unit = { box open { msg => msg.buffer = Array(1, 2, 3, 4) } ... } class Message { var buffer: Array[Byte] = _ } Accessing an encapsulated object requires the use of open
and Object Capabilities Transferable objects: instances of a new generic type Box[T] 19 def receive(box: Box[Message]): Unit = { box open { msg => msg.buffer = Array(1, 2, 3, 4) } ... } class Message { var buffer: Array[Byte] = _ } msg is the encapsulated object Accessing an encapsulated object requires the use of open
opening a box requires a corresponding permission provided by the context • Invoking open on box requires a permission with the following type: 20 CanAccess { type C = box.C }
opening a box requires a corresponding permission provided by the context • Invoking open on box requires a permission with the following type: 20 CanAccess { type C = box.C } Dependent type
opening a box requires a corresponding permission provided by the context • Invoking open on box requires a permission with the following type: • Type member C links the permission type to a specific box 20 CanAccess { type C = box.C } Dependent type
opening a box requires a corresponding permission provided by the context • Invoking open on box requires a permission with the following type: • Type member C links the permission type to a specific box • A permission type CanAccess { type C = låda.C } would only be compatible with box iff 20 CanAccess { type C = box.C } Dependent type
opening a box requires a corresponding permission provided by the context • Invoking open on box requires a permission with the following type: • Type member C links the permission type to a specific box • A permission type CanAccess { type C = låda.C } would only be compatible with box iff – box and låda are aliases (statically-known) 20 CanAccess { type C = box.C } Dependent type
restricting types put into boxes • Requirements for “safe” classes:* – Methods only access parameters and this – Method parameter types are “safe” – Methods only instantiate “safe” classes – Types of fields are “safe” 24 * simplified
restricting types put into boxes • Requirements for “safe” classes:* – Methods only access parameters and this – Method parameter types are “safe” – Methods only instantiate “safe” classes – Types of fields are “safe” 24 “Safe” = conforms to object capability model [4] * simplified [4] Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, 2006
of object capabilities (type-based), uniqueness, separation, concurrency • Meta theory – Type soundness – Isolation theorem for processes with shared heap 27
of object capabilities (type-based), uniqueness, separation, concurrency • Meta theory – Type soundness – Isolation theorem for processes with shared heap • Paper: 27 [5] Haller and Loiko. LaCasa: Lightweight affinity and object capabilities in Scala. OOPSLA 2016
studies 28 [6] Erik Reimers. Lightweight Software Isolation via Flow-Sensitive Capabilities in Scala. Master's thesis, KTH, 2017 (supervisor Philipp Haller)
studies – How much effort to change existing code? 28 [6] Erik Reimers. Lightweight Software Isolation via Flow-Sensitive Capabilities in Scala. Master's thesis, KTH, 2017 (supervisor Philipp Haller)
studies – How much effort to change existing code? 28 [6] Erik Reimers. Lightweight Software Isolation via Flow-Sensitive Capabilities in Scala. Master's thesis, KTH, 2017 (supervisor Philipp Haller) [7] Haller, Sommar. Towards an Empirical Study of Affine Types for Isolated Actors in Scala. PLACES@ETAPS 2017
studies – How much effort to change existing code? • Complete mechanization of meta-theory 28 [6] Erik Reimers. Lightweight Software Isolation via Flow-Sensitive Capabilities in Scala. Master's thesis, KTH, 2017 (supervisor Philipp Haller) [7] Haller, Sommar. Towards an Empirical Study of Affine Types for Isolated Actors in Scala. PLACES@ETAPS 2017
when transfering objects conforming to the object capability discipline – Binary check whether a class is reusable unchanged • Integration with the full Scala language 29
when transfering objects conforming to the object capability discipline – Binary check whether a class is reusable unchanged • Integration with the full Scala language • In medium to large open-source Scala projects, 21-67% of all classes conform to the object capability discipline 29
for data-race safe concurrency • Part 2: Practical deterministic concurrency • Part 3: Lineage-based distributed programming • Ongoing and future work • Conclusion 30
val set = Set.empty[Int] Future { set.put(1) } set.put(2) Eventually, set contains both 1 and 2, always Bottom line: it depends on the datatype Assume: concurrent set
• More precisely: 35 "All non-failing executions compute the same result." "Quasi-determinism" [8] [8] Kuper et al. Freeze after writing: quasi-deterministic parallel programming with LVars. POPL 2014
on: – event-driven concurrency (similar to futures and promises) – lattice-based data types – reactive programming • Build on LaCasa's type system to provide quasi-determinism guarantee at compile time 37
38 Class type hierarchy: D E F G class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) }
38 Which types does method f possibly return? Class type hierarchy: D E F G class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) }
methods g and h; method h calls method f 39 class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) }
methods g and h; method h calls method f • Programming model let's us express the resulting dependencies, forming a directed graph: 39 class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) }
methods g and h; method h calls method f • Programming model let's us express the resulting dependencies, forming a directed graph: 39 class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) } f g h
methods g and h; method h calls method f • Programming model let's us express the resulting dependencies, forming a directed graph: 39 class C { def f(x: Int): D = if (x <= 0) g(x) else h(x-1) def g(y: Int): E = new E(y) def h(z: Int): D = if (z == 0) new F else f(z) } f g h "calls"
– Extension of imperative, object-oriented base language – Resolution of cyclic dependencies – Type system for object capabilities for safety • Experimental evaluation using large-scale, concurrent static analysis • Prototype implementation: https://github.com/phaller/reactive-async 42
– Extension of imperative, object-oriented base language – Resolution of cyclic dependencies – Type system for object capabilities for safety • Experimental evaluation using large-scale, concurrent static analysis • Prototype implementation: https://github.com/phaller/reactive-async 42 [9] Haller, Geries, Eichberg, and Salvaneschi. Reactive Async: Expressive deterministic concurrency. Scala Symposium 2016
for data-race safe concurrency • Part 2: Practical deterministic concurrency • Part 3: Lineage-based distributed programming • Ongoing and future work • Conclusion 43
particular expected result? Lineage may record information about: Data sets read/transformed for producing result data set 44 Etc. Services used for producing response
New data-centric programming model for functional processing of distributed data. Key ideas: 45 Provide lineages by programming abstractions Utilize lineages for fault recovery
New data-centric programming model for functional processing of distributed data. Key ideas: 45 Provide lineages by programming abstractions Keep data stationary (if possible), send functions Utilize lineages for fault recovery
parts: Silos: stationary, typed, immutable data containers SiloRefs: references to local or remote Silos. Spores [10]: safe, serializable functions. 46
parts: Silos: stationary, typed, immutable data containers SiloRefs: references to local or remote Silos. Spores [10]: safe, serializable functions. 46 [10] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014
parts. def apply def send def persist def unpersist SiloRef. Handle to a Silo. Silo. Typed, stationary data container. User interacts with SiloRef. SiloRefs come with 4 primitive operations. 48
apply Takes a function that is to be applied to the data in the silo associated with the SiloRef. Creates new silo to contain the data that the user- defined function returns; evaluation is deferred def apply[S](fun: T => SiloRef[S]): SiloRef[S] Enables interesting computation DAGs Deferred def apply def send def persist def unpersist 49
send Forces the built-up computation DAG to be sent to the associated node and applied. Future is completed with the result of the computation. def send(): Future[T] EAGER def apply def send def persist def unpersist 50
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(spore { val localVehicles = vehicles // spore header ps => localVehicles.apply(spore { val localps = ps // spore header vs => SiloRef.populate(currentHost, localps.flatMap(p => // list of (p, v) for a single person p vs.flatMap { v => if (v.owner.name == p.name) List((p, v)) else Nil } ) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 55
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) 56
make an interesting DAG! Machine 2 persons: val persons: SiloRef[List[Person]] = ... val vehicles: SiloRef[List[Vehicle]] = ... // adults that own a vehicle val owners = adults.apply(...) adults owners vehicles sorted labels so far we just staged computation, we haven’t yet “kicked it off”. val adults = persons.apply(spore { ps => val res = ps.filter(p => p.age >= 18) SiloRef.populate(currentHost, res) }) val sorted = adults.apply(spore { ps => SiloRef.populate(currentHost, ps.sortWith(p => p.age)) }) val labels = sorted.apply(spore { ps => SiloRef.populate(currentHost, ps.map(p => "Hi, " + p.name)) }) 58
a lineage, a persistent (in the sense of functional programming) data structure. The lineage is the DAG of operations used to derive the data of a silo. Since the lineage is composed of spores, it is serializable. This means it can be persisted or transferred to other machines. Putting lineages to work 60
database + systems communities, in the context of PL. Natural fit in context of functional programming! A functional design for fault-tolerance Putting lineages to work Formalization: typed, distributed core language with spores, silos, and futures. 61
theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. 66
theorem guarantees preservation of types under reduction, as well as preservation of lineage mobility. Progress theorem guarantees the finite materialization of remote, lineage-based data. 66 First correctness results for a programming model for lineage-based distributed computation.
reuse via object capabilities • Reactive Async: practical deterministic concurrency – For an imperative, object-oriented language – Type system guarantees quasi-determinism at compile time 69
reuse via object capabilities • Reactive Async: practical deterministic concurrency – For an imperative, object-oriented language – Type system guarantees quasi-determinism at compile time • Lineage-based distributed programming 69
reuse via object capabilities • Reactive Async: practical deterministic concurrency – For an imperative, object-oriented language – Type system guarantees quasi-determinism at compile time • Lineage-based distributed programming – First correctness results for a lineage-based distributed programming model 69
reuse via object capabilities • Reactive Async: practical deterministic concurrency – For an imperative, object-oriented language – Type system guarantees quasi-determinism at compile time • Lineage-based distributed programming – First correctness results for a lineage-based distributed programming model • Finite materialization of distributed, lineage-based data 69
tolerance Determinism Distributed Shared State Security & Privacy Privacy-aware distribution Information- flow security [13] Salvaneschi, Köhler, Haller, Erdweg, and Mezini. Language-Integrated Privacy-Aware Distributed Queries. 2018, draft [12] Zhao, Haller. Observable atomic consistency for CvRDTs. CoRR abs/1802.09462 (2018) Chaos Engineering Testing hypotheses about resilience in production systems [14] Zhang, Morin, Haller, Baudry, Monperrus. A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM. CoRR abs/1805.05246 (2018)
Haller, Salvaneschi, Watanabe, Agha. "Programming Languages for Distributed Systems", May 27–30, 2019 • Dagstuhl Seminar: Haller, Lopes, Markl, Salvaneschi. "Programming Languages for Distributed Systems and Distributed Data Management" (65-0618), October 28–31, 2019 71
[3]: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ • [4] Mark S. Miller. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis, 2006 • [5] Haller and Loiko. LaCasa: Lightweight affinity and object capabilities in Scala. OOPSLA 2016 • [6] Erik Reimers. Lightweight Software Isolation via Flow-Sensitive Capabilities in Scala. Master's thesis, KTH, 2017 (supervisor Philipp Haller) • [7] Haller, Sommar. Towards an Empirical Study of Affine Types for Isolated Actors in Scala. PLACES@ETAPS 2017 • [8] Kuper et al. Freeze after writing: quasi-deterministic parallel programming with LVars. POPL 2014 • [9] Haller, Geries, Eichberg, and Salvaneschi. Reactive Async: Expressive Deterministic Concurrency. Scala Symposium 2016 • [10] Miller, Haller, and Odersky. Spores: a type-based foundation for closures in the age of concurrency and distribution. ECOOP 2014 • [11] Haller, Miller, and Müller. A programming model and foundation for lineage-based distributed computation. Journal of Functional Programming 28 (2018): e7 • [12] Zhao, Haller. Observable atomic consistency for CvRDTs. CoRR abs/1802.09462 (2018) • [13] Salvaneschi, Köhler, Haller, Erdweg, and Mezini. Language-Integrated Privacy-Aware Distributed Queries. 2018, draft • [14] Zhang, Morin, Haller, Baudry, Monperrus. A Chaos Engineering System for Live Analysis and Falsification of Exception-handling in the JVM. CoRR abs/1805.05246 (2018) 72
and distributed systems providing high availability, high scalability, and fault tolerance • Methods: – Type systems: theory & practice 73 Sound foundations and provable guarantees!
and distributed systems providing high availability, high scalability, and fault tolerance • Methods: – Type systems: theory & practice – Design and implementation of programming systems 73 Sound foundations and provable guarantees!
and distributed systems providing high availability, high scalability, and fault tolerance • Methods: – Type systems: theory & practice – Design and implementation of programming systems – Empirical studies 73 Sound foundations and provable guarantees!
and distributed systems providing high availability, high scalability, and fault tolerance • Methods: – Type systems: theory & practice – Design and implementation of programming systems – Empirical studies 73 Sound foundations and provable guarantees! Thank You!