Slide 1

Slide 1 text

Philipp Haller KTH Royal Institute of Technology Stockholm, Sweden Entwicklertag Frankfurt, Germany, 21 February, 2018 Programming Reactive Systems in Scala: Principles and Abstractions

Slide 2

Slide 2 text

Philipp Haller What are reactive systems? • Multiple definitions proposed previously, e.g. by Gérard Berry [1] and by the Reactive Manifesto [2] • Common among definitions: reactive systems • react to events or messages from their environment • react (typically) "at a speed which is determined by the environment, not the program itself" [1] • Thus, reactive systems are: • responsive • scalable 2

Slide 3

Slide 3 text

Philipp Haller What makes it so difficult to build reactive systems? 3 1. Workloads require massive scalability • Steam, a digital distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3] • CERN amassed about 200 PB of data from over 800 trillion collisions looking for the Higgs boson. [4] • Twitter has about 330 million monthly active users [5] 2. Reacting at the speed of the environment (guaranteed timely responses)

Slide 4

Slide 4 text

Philipp Haller 4 Steam delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3]

Slide 5

Slide 5 text

Philipp Haller What makes it so difficult to build reactive systems? 1. Workloads require massive scalability • Steam, a digital distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3] • CERN amassed about 200 PB of data from over 800 trillion collisions looking for the Higgs boson. [4] • Twitter has about 330 million monthly active users [5] 2. Reacting at the speed of the environment (guaranteed timely responses) 5 February 2018 Q4, 2017

Slide 6

Slide 6 text

Philipp Haller Example: Twitter during Obama's inauguration 6 “We saw 5x normal tweets-per-second and about 4x tweets-per-minute as this chart illustrates.” [6]

Slide 7

Slide 7 text

Philipp Haller Implications • Massive scalability ➟ large-scale distribution • Timely responses + distribution ➟ resiliency 7 "To make a fault-tolerant system you need at least two computers." - Joe Armstrong [7]

Slide 8

Slide 8 text

Philipp Haller How to program reactive systems? Want to build systems responding to events emitted by their environment in a way that enables scalability, distribution, and resiliency • We're looking for programming abstractions! • How did we approach this in the Scala project? 8

Slide 9

Slide 9 text

Philipp Haller Example • Chat service • Many long-lived connections • Usually idle, with short bursts of traffic 9

Slide 10

Slide 10 text

Philipp Haller Chat service: first try • Thread per user session • Huge overheads stemming from heavyweight threads • Does not scale to large numbers of users 10

Slide 11

Slide 11 text

Philipp Haller Chat service: second try • Asynchronous I/O and thread pool • Session state maintained in regular objects (e.g., POJOs) • Much more scalable • Problems: • Code difficult to maintain 
 ➟ "callback hell" [8] • Blocking calls fatal 11

Slide 12

Slide 12 text

Philipp Haller The trouble with blocking ops 12 def after[T](delay: Long, value: T): Future[T] Example Function for creating a Future that is completed with value after delay milliseconds

Slide 13

Slide 13 text

Philipp Haller "after", version 1 13 def after1[T](delay: Long, value: T) = Future { Thread.sleep(delay) value }

Slide 14

Slide 14 text

Philipp Haller "after", version 1 14 assert(Runtime.getRuntime() .availableProcessors() == 8) for (_ <- 1 to 8) yield after1(1000, true) val later = after1(1000, true) How does it behave? Quiz: when is “later” completed? Answer: after either ~1 s or ~2 s (most often)

Slide 15

Slide 15 text

Philipp Haller Promises 15 object Promise { def apply[T](): Promise[T] } trait Promise[T] { def success(value: T): Promise[T] def failure(cause: Throwable): Promise[T] def future: Future[T] }

Slide 16

Slide 16 text

Philipp Haller "after", version 2 16 def after2[T](delay: Long, value: T) = { val promise = Promise[T]() timer.schedule(new TimerTask { def run(): Unit = promise.success(value) }, delay) promise.future } Much better behaved!

Slide 17

Slide 17 text

Philipp Haller Chat service example • Neither of the shown approaches is satisfactory • Thread-based approach induces huge overheads, does not scale • Event-driven approach suffers from callback hell and blocking operations are troublesome 17 We need better programming abstractions which reconcile scalability and productivity

Slide 18

Slide 18 text

Philipp Haller Better programming abstractions • At the end of 2005, our main influence was the Erlang programming language • One of very few success stories in the area of concurrent programming • Had been used successfully to build the influential Ericsson AXD301 switch providing an availability of nine nines • … and there was a really great movie about Erlang [9] ;-) • Additional influences, including Argus [10], the join- calculus [11], and other seminal languages and systems 18 Less than 32ms downtime per year

Slide 19

Slide 19 text

Philipp Haller Erlang and the actor model • Erlang: a dynamic, functional, distributed, concurrency-oriented programming language • Provides an implementation of the actor model of concurrency [12] • Actors = concurrent "processes" communicating via message passing • No shared state • Senders decoupled from receivers ➟ asynchronous messaging • Upon receiving a message, an actor may • change its behavior/state • send messages to actors (including itself) • create new actors 19 Sender does not fail if receiver fails!

Slide 20

Slide 20 text

Philipp Haller Actors in Scala (using Akka) 20 class Counter extends Actor with ActorLogging { var sum = 0 def receive = { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } } Definition of an actor class: case class AddAll(values: Array[Int]) case class PrintSum()

Slide 21

Slide 21 text

Philipp Haller Client of an actor 21 object Main { def main(args: Array[String]): Unit = { val system = ActorSystem("system") val counter: ActorRef = system.actorOf(Counter.props, "counter") counter ! AddAll(Array(1, 2, 3)) counter ! AddAll(Array(4, 5)) counter ! PrintSum() } } Creating and using an actor: Asynchronous message sends object Counter { def props: Props = Props(new Counter) } Actor creation properties

Slide 22

Slide 22 text

Philipp Haller Actors: important features • Actors are isolated • Field sum not accessible from outside • Ensured by exposing only an ActorRef to clients • ActorRef provides an extremely simple interface • Messages in actor's mailbox are processed sequentially • No concurrency control necessary within an actor • Messaging is location-transparent • ActorRefs may be remote; can be sent in messages 22

Slide 23

Slide 23 text

Philipp Haller Resiliency using actors • Erlang's approach to fault handling: "let it crash!" • Do not: • try to avoid failure • attempt to repair program state/data in case of failure • Do: • let faulty actors crash • manage crashed actors via supervision 23

Slide 24

Slide 24 text

Philipp Haller Actor supervision: strategy 1 24

Slide 25

Slide 25 text

Philipp Haller Actor supervision: strategy 2 25

Slide 26

Slide 26 text

Philipp Haller Actor supervision: strategy 3 26

Slide 27

Slide 27 text

Philipp Haller Resiliency (continued) How to restart a fresh actor from some previous state? • Supervisor initializes its state, or • Fresh actor obtains initial state from elsewhere, or • Fresh actor replays received messages from persistent log
 ➟ event sourcing: Akka Persistence 27

Slide 28

Slide 28 text

Philipp Haller Actors in Scala • Q: Is all of this built into Scala? • A: Not quite. 28

Slide 29

Slide 29 text

Philipp Haller Deconstructing actors 29 def receive = { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } • receive method returns a partial function defined by the block of cases { … }

Slide 30

Slide 30 text

Philipp Haller Deconstructing actors 30 object Actor { // Type alias for receive blocks type Receive = PartialFunction[Any, Unit] // ... } trait Actor { def receive: Actor.Receive // ... }

Slide 31

Slide 31 text

Philipp Haller Partial functions 31 • Partial functions have a type PartialFunction[A, B] • PartialFunction[A, B] is a subtype of Function1[A, B] trait Function1[A, B] { def apply(x: A): B .. } trait PartialFunction[A, B] extends Function1[A, B] { def isDefinedAt(x: A): Boolean def orElse[A1 <: A, B1 >: B] (that: PartialFunction[A1, B1]): PartialFunction[A1, B1] .. } Simplified!

Slide 32

Slide 32 text

Philipp Haller Pattern matching The case clauses are just regular pattern matching in Scala: 32 { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } val opt: Option[Int] = this.getOption() opt match { case Some(x) => // full optional object // use `x` of type `Int` case None => // empty optional object // no value available }

Slide 33

Slide 33 text

Philipp Haller Deconstructing actors 33 counter ! AddAll(Array(1, 2, 3)) counter ! AddAll(Array(4, 5)) counter ! PrintSum() The ! operator is just a method written using infix syntax: "Aha! Built-in support for messaging!!" abstract class ActorRef extends .. { def !(message: Any): Unit // .. } Simplified! Not actual implementation!

Slide 34

Slide 34 text

Philipp Haller Summary • Actors not built into Scala • Rely only on shared-memory threads of the JVM • Scala as a "growable" language [13] • Programming models as libraries • Akka actors = domain-specific language (DSL) embedded in Scala • Many of the patterns and techniques first implemented in Scala Actors [14] 34

Slide 35

Slide 35 text

Philipp Haller 35 https://www.lightbend.com/akka-five-year-anniversary

Slide 36

Slide 36 text

Philipp Haller There is more 36 • Q: Actors are clearly awesome! All problems solved? • A: Not quite.

Slide 37

Slide 37 text

Philipp Haller Example 37 Image data apply filter Image processing pipeline: filter 1 filter 2 Pipeline stages run concurrently

Slide 38

Slide 38 text

Philipp Haller Implementation 38 • Assumptions: • Image data large • Main memory expensive • Approach for high performance: • In-place update of image buffers • Pass mutable buffers by-reference

Slide 39

Slide 39 text

Philipp Haller Problem 39 Easy to produce data races: 1. Stage 1 sends a reference to a buffer to stage 2 2. Following the send, both stages have a reference to the same buffer 3. Stages can concurrently access the buffer

Slide 40

Slide 40 text

Philipp Haller Preventing data races 40 • Approach: safe transfer of ownership • Sending stage loses ownership • Compiler prevents sender from accessing objects that have been transferred • Advantages: • No run-time overhead • Safety does not compromise performance • Errors caught at compile time

Slide 41

Slide 41 text

Philipp Haller Ownership transfer in Scala 41 • Active research project: LaCasa [15] • LaCasa: Scala extension for affine references • "Transferable" references • At most one owner per transferable reference

Slide 42

Slide 42 text

Philipp Haller Affine references in LaCasa 42 • LaCasa provides affine references by combining two concepts: • Access permissions • Encapsulated boxes

Slide 43

Slide 43 text

Philipp Haller Access permissions 43 • Access to transferable objects controlled by implicit permissions • Type member C uniquely identifies box CanAccess { type C } Box[T] { type C }

Slide 44

Slide 44 text

Philipp Haller Creating boxes and permissions 44 mkBox[Message] { packed => } class Message { var arr: Array[Int] = _ } sealed trait Packed[+T] { val box: Box[T] val access: CanAccess { type C = box.C } } implicit val access = packed.access val box = packed.box … LaCasa library

Slide 45

Slide 45 text

Philipp Haller Accessing boxes 45 • Boxes are encapsulated • Boxes must be opened for access mkBox[Message] { packed => implicit val access = packed.access val box = packed.box box open { msg => msg.arr = Array(1, 2, 3, 4) } } Requires implicit access permission

Slide 46

Slide 46 text

Philipp Haller Consuming permissions 46 Example: transfering a box from one actor to another consumes its access permission mkBox[Message] { packed => implicit val access = packed.access val box = packed.box … someActor.send(box) { // make `access` unavailable … } } Leverage spores [1]

Slide 47

Slide 47 text

Philipp Haller Encapsulation 47 Problem: not all types safe to transfer! class Message { var arr: Array[Int] = _ def leak(): Unit = { SomeObject.fld = arr } } object SomeObject { var fld: Array[Int] = _ }

Slide 48

Slide 48 text

Philipp Haller Encapsulation 48 • Ensuring absence of data races requires restricting types put into boxes • Requirements for “safe” classes:* • Methods only access parameters and this • Method parameter types are “safe” • Methods only instantiate “safe” classes • Types of fields are “safe” “Safe” = conforms to object capability model [17] * simplified

Slide 49

Slide 49 text

Philipp Haller Object capabilities in Scala 49 • How common is object-capability safe code in Scala? • Empirical study of over 75,000 SLOC of open-source Scala code: Project Version SLOC GitHub stats Scala stdlib 2.11.7 33,107 ✭5,795 257 Signal/Collect 8.0.6 10,159 ✭123 11 GeoTrellis 0.10.0-RC2 35,351 ✭400 38 -engine 3,868 -raster 22,291 -spark 9,192

Slide 50

Slide 50 text

Philipp Haller Object capabilities in Scala 50 Results of empirical study: Project #classes/traits #ocap (%) #dir. insec. (%) Scala stdlib 1,505 644 (43%) 212/861 (25%) Signal/Collect 236 159 (67%) 60/77 (78%) GeoTrellis -engine 190 40 (21%) 124/150 (83%) -raster 670 233 (35%) 325/437 (74%) -spark 326 101 (31%) 167/225 (74%) Total 2,927 1,177 (40%) 888/1,750 (51%) Immutability inference increases these percentages!

Slide 51

Slide 51 text

Philipp Haller Ongoing work 51 • Flow-sensitive type checking • "Don't indent when consuming permission" • Empirical studies • How much effort to change existing code? • Language support for immutable types [18] • Complete mechanization in Coq proof assistant

Slide 52

Slide 52 text

Philipp Haller Conclusion • Scala enables powerful libraries for reactive programming • Akka actors representative example • There are many others: Akka Streams, Spark Streaming, REScala [19] etc. • Not all concurrency hazards can be prevented by Scala's current type system. • In ongoing research projects, such as LaCasa and Reactive Async [20], we are exploring ways to rule out data races and non-determinism 52

Slide 53

Slide 53 text

Philipp Haller References (1) • [1]: Gérard Berry, 1989. http://www-sop.inria.fr/members/Gerard.Berry/Papers/Berry- IFIP-89.pdf • [2]: https://www.reactivemanifesto.org/ • [3]: http://store.steampowered.com/stats/content/ • [4]: https://www.itbusinessedge.com/cm/blogs/lawson/the-big-data-software-problem- behind-cerns-higgs-boson-hunt/?cs=50736 • [5]: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ • [6]: https://blog.twitter.com/2009/inauguration-day-twitter • [7]: http://www.erlang-factory.com/upload/presentations/45/keynote_joearmstrong.pdf • [8]: http://static.usenix.org/publications/library/proceedings/usenix02/full_papers/ adyahowell/adyahowell_html/ • [9]: https://www.youtube.com/watch?v=uKfKtXYLG78 • [10]: Liskov, 1988. Distributed programming in Argus. Communications of the ACM, 31(3), pp.300-312. https://dl.acm.org/citation.cfm?id=42399 53

Slide 54

Slide 54 text

Philipp Haller References (2) • [11]: Fournet and Gonthier, 1996. The reflexive CHAM and the join-calculus. Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (pp. 372-385).
 https://dl.acm.org/citation.cfm?id=237805 • [12]: Hewitt, Bishop, and Steiger, 1973. A universal modular actor formalism for artificial intelligence. Proc. IJCAI. See also https://eighty-twenty.org/2016/10/18/actors-hopl • [13]: Guy Steele, 1998. "Growing a Language". OOPSLA keynote.
 https://www.youtube.com/watch?v=_ahvzDzKdB0 • [14]: Haller and Odersky, 2007. Actors that unify threads and events. In International Conference on Coordination Languages and Models (pp. 171-190). Springer, Berlin, Heidelberg.
 https://link.springer.com/chapter/10.1007/978-3-540-72794-1_10 • [15]: https://github.com/phaller/lacasa • [16]: Miller, Haller, and Odersky, 2014. Spores: A type-based foundation for closures in the age of concurrency and distribution. Proc. ECOOP
 https://github.com/scalacenter/spores • [17]: Mark S. Miller, 2006. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis • [18]: https://www.youtube.com/watch?v=IiCt4nZfQfg • [19]: http://guidosalva.github.io/REScala/ • [20]: https://github.com/phaller/reactive-async 54