Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Programming Reactive Systems in Scala: Principles and Abstractions

3b84657fdb075382e3781310ca8a9a70?s=47 Philipp Haller
February 21, 2018
86

Programming Reactive Systems in Scala: Principles and Abstractions

3b84657fdb075382e3781310ca8a9a70?s=128

Philipp Haller

February 21, 2018
Tweet

Transcript

  1. Philipp Haller KTH Royal Institute of Technology Stockholm, Sweden Entwicklertag

    Frankfurt, Germany, 21 February, 2018 Programming Reactive Systems in Scala: Principles and Abstractions
  2. Philipp Haller What are reactive systems? • Multiple definitions proposed

    previously, e.g. by Gérard Berry [1] and by the Reactive Manifesto [2] • Common among definitions: reactive systems • react to events or messages from their environment • react (typically) "at a speed which is determined by the environment, not the program itself" [1] • Thus, reactive systems are: • responsive • scalable 2
  3. Philipp Haller What makes it so difficult to build reactive

    systems? 3 1. Workloads require massive scalability • Steam, a digital distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3] • CERN amassed about 200 PB of data from over 800 trillion collisions looking for the Higgs boson. [4] • Twitter has about 330 million monthly active users [5] 2. Reacting at the speed of the environment (guaranteed timely responses)
  4. Philipp Haller 4 Steam delivers 16.9 PB per week to

    users in Germany (USA: 46.9 PB) [3]
  5. Philipp Haller What makes it so difficult to build reactive

    systems? 1. Workloads require massive scalability • Steam, a digital distribution service, delivers 16.9 PB per week to users in Germany (USA: 46.9 PB) [3] • CERN amassed about 200 PB of data from over 800 trillion collisions looking for the Higgs boson. [4] • Twitter has about 330 million monthly active users [5] 2. Reacting at the speed of the environment (guaranteed timely responses) 5 February 2018 Q4, 2017
  6. Philipp Haller Example: Twitter during Obama's inauguration 6 “We saw

    5x normal tweets-per-second and about 4x tweets-per-minute as this chart illustrates.” [6]
  7. Philipp Haller Implications • Massive scalability ➟ large-scale distribution •

    Timely responses + distribution ➟ resiliency 7 "To make a fault-tolerant system you need at least two computers." - Joe Armstrong [7]
  8. Philipp Haller How to program reactive systems? Want to build

    systems responding to events emitted by their environment in a way that enables scalability, distribution, and resiliency • We're looking for programming abstractions! • How did we approach this in the Scala project? 8
  9. Philipp Haller Example • Chat service • Many long-lived connections

    • Usually idle, with short bursts of traffic 9
  10. Philipp Haller Chat service: first try • Thread per user

    session • Huge overheads stemming from heavyweight threads • Does not scale to large numbers of users 10
  11. Philipp Haller Chat service: second try • Asynchronous I/O and

    thread pool • Session state maintained in regular objects (e.g., POJOs) • Much more scalable • Problems: • Code difficult to maintain 
 ➟ "callback hell" [8] • Blocking calls fatal 11
  12. Philipp Haller The trouble with blocking ops 12 def after[T](delay:

    Long, value: T): Future[T] Example Function for creating a Future that is completed with value after delay milliseconds
  13. Philipp Haller "after", version 1 13 def after1[T](delay: Long, value:

    T) = Future { Thread.sleep(delay) value }
  14. Philipp Haller "after", version 1 14 assert(Runtime.getRuntime() .availableProcessors() == 8)

    for (_ <- 1 to 8) yield after1(1000, true) val later = after1(1000, true) How does it behave? Quiz: when is “later” completed? Answer: after either ~1 s or ~2 s (most often)
  15. Philipp Haller Promises 15 object Promise { def apply[T](): Promise[T]

    } trait Promise[T] { def success(value: T): Promise[T] def failure(cause: Throwable): Promise[T] def future: Future[T] }
  16. Philipp Haller "after", version 2 16 def after2[T](delay: Long, value:

    T) = { val promise = Promise[T]() timer.schedule(new TimerTask { def run(): Unit = promise.success(value) }, delay) promise.future } Much better behaved!
  17. Philipp Haller Chat service example • Neither of the shown

    approaches is satisfactory • Thread-based approach induces huge overheads, does not scale • Event-driven approach suffers from callback hell and blocking operations are troublesome 17 We need better programming abstractions which reconcile scalability and productivity
  18. Philipp Haller Better programming abstractions • At the end of

    2005, our main influence was the Erlang programming language • One of very few success stories in the area of concurrent programming • Had been used successfully to build the influential Ericsson AXD301 switch providing an availability of nine nines • … and there was a really great movie about Erlang [9] ;-) • Additional influences, including Argus [10], the join- calculus [11], and other seminal languages and systems 18 Less than 32ms downtime per year
  19. Philipp Haller Erlang and the actor model • Erlang: a

    dynamic, functional, distributed, concurrency-oriented programming language • Provides an implementation of the actor model of concurrency [12] • Actors = concurrent "processes" communicating via message passing • No shared state • Senders decoupled from receivers ➟ asynchronous messaging • Upon receiving a message, an actor may • change its behavior/state • send messages to actors (including itself) • create new actors 19 Sender does not fail if receiver fails!
  20. Philipp Haller Actors in Scala (using Akka) 20 class Counter

    extends Actor with ActorLogging { var sum = 0 def receive = { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } } Definition of an actor class: case class AddAll(values: Array[Int]) case class PrintSum()
  21. Philipp Haller Client of an actor 21 object Main {

    def main(args: Array[String]): Unit = { val system = ActorSystem("system") val counter: ActorRef = system.actorOf(Counter.props, "counter") counter ! AddAll(Array(1, 2, 3)) counter ! AddAll(Array(4, 5)) counter ! PrintSum() } } Creating and using an actor: Asynchronous message sends object Counter { def props: Props = Props(new Counter) } Actor creation properties
  22. Philipp Haller Actors: important features • Actors are isolated •

    Field sum not accessible from outside • Ensured by exposing only an ActorRef to clients • ActorRef provides an extremely simple interface • Messages in actor's mailbox are processed sequentially • No concurrency control necessary within an actor • Messaging is location-transparent • ActorRefs may be remote; can be sent in messages 22
  23. Philipp Haller Resiliency using actors • Erlang's approach to fault

    handling: "let it crash!" • Do not: • try to avoid failure • attempt to repair program state/data in case of failure • Do: • let faulty actors crash • manage crashed actors via supervision 23
  24. Philipp Haller Actor supervision: strategy 1 24

  25. Philipp Haller Actor supervision: strategy 2 25

  26. Philipp Haller Actor supervision: strategy 3 26

  27. Philipp Haller Resiliency (continued) How to restart a fresh actor

    from some previous state? • Supervisor initializes its state, or • Fresh actor obtains initial state from elsewhere, or • Fresh actor replays received messages from persistent log
 ➟ event sourcing: Akka Persistence 27
  28. Philipp Haller Actors in Scala • Q: Is all of

    this built into Scala? • A: Not quite. 28
  29. Philipp Haller Deconstructing actors 29 def receive = { case

    AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } • receive method returns a partial function defined by the block of cases { … }
  30. Philipp Haller Deconstructing actors 30 object Actor { // Type

    alias for receive blocks type Receive = PartialFunction[Any, Unit] // ... } trait Actor { def receive: Actor.Receive // ... }
  31. Philipp Haller Partial functions 31 • Partial functions have a

    type PartialFunction[A, B] • PartialFunction[A, B] is a subtype of Function1[A, B] trait Function1[A, B] { def apply(x: A): B .. } trait PartialFunction[A, B] extends Function1[A, B] { def isDefinedAt(x: A): Boolean def orElse[A1 <: A, B1 >: B] (that: PartialFunction[A1, B1]): PartialFunction[A1, B1] .. } Simplified!
  32. Philipp Haller Pattern matching The case clauses are just regular

    pattern matching in Scala: 32 { case AddAll(values) => sum += values.reduce((x, y) => x + y) case PrintSum() => log.info(s"the sum is: $sum") } val opt: Option[Int] = this.getOption() opt match { case Some(x) => // full optional object // use `x` of type `Int` case None => // empty optional object // no value available }
  33. Philipp Haller Deconstructing actors 33 counter ! AddAll(Array(1, 2, 3))

    counter ! AddAll(Array(4, 5)) counter ! PrintSum() The ! operator is just a method written using infix syntax: "Aha! Built-in support for messaging!!" abstract class ActorRef extends .. { def !(message: Any): Unit // .. } Simplified! Not actual implementation!
  34. Philipp Haller Summary • Actors not built into Scala •

    Rely only on shared-memory threads of the JVM • Scala as a "growable" language [13] • Programming models as libraries • Akka actors = domain-specific language (DSL) embedded in Scala • Many of the patterns and techniques first implemented in Scala Actors [14] 34
  35. Philipp Haller 35 https://www.lightbend.com/akka-five-year-anniversary

  36. Philipp Haller There is more 36 • Q: Actors are

    clearly awesome! All problems solved? • A: Not quite.
  37. Philipp Haller Example 37 Image data apply filter Image processing

    pipeline: filter 1 filter 2 Pipeline stages run concurrently
  38. Philipp Haller Implementation 38 • Assumptions: • Image data large

    • Main memory expensive • Approach for high performance: • In-place update of image buffers • Pass mutable buffers by-reference
  39. Philipp Haller Problem 39 Easy to produce data races: 1.

    Stage 1 sends a reference to a buffer to stage 2 2. Following the send, both stages have a reference to the same buffer 3. Stages can concurrently access the buffer
  40. Philipp Haller Preventing data races 40 • Approach: safe transfer

    of ownership • Sending stage loses ownership • Compiler prevents sender from accessing objects that have been transferred • Advantages: • No run-time overhead • Safety does not compromise performance • Errors caught at compile time
  41. Philipp Haller Ownership transfer in Scala 41 • Active research

    project: LaCasa [15] • LaCasa: Scala extension for affine references • "Transferable" references • At most one owner per transferable reference
  42. Philipp Haller Affine references in LaCasa 42 • LaCasa provides

    affine references by combining two concepts: • Access permissions • Encapsulated boxes
  43. Philipp Haller Access permissions 43 • Access to transferable objects

    controlled by implicit permissions • Type member C uniquely identifies box CanAccess { type C } Box[T] { type C }
  44. Philipp Haller Creating boxes and permissions 44 mkBox[Message] { packed

    => } class Message { var arr: Array[Int] = _ } sealed trait Packed[+T] { val box: Box[T] val access: CanAccess { type C = box.C } } implicit val access = packed.access val box = packed.box … LaCasa library
  45. Philipp Haller Accessing boxes 45 • Boxes are encapsulated •

    Boxes must be opened for access mkBox[Message] { packed => implicit val access = packed.access val box = packed.box box open { msg => msg.arr = Array(1, 2, 3, 4) } } Requires implicit access permission
  46. Philipp Haller Consuming permissions 46 Example: transfering a box from

    one actor to another consumes its access permission mkBox[Message] { packed => implicit val access = packed.access val box = packed.box … someActor.send(box) { // make `access` unavailable … } } Leverage spores [1]
  47. Philipp Haller Encapsulation 47 Problem: not all types safe to

    transfer! class Message { var arr: Array[Int] = _ def leak(): Unit = { SomeObject.fld = arr } } object SomeObject { var fld: Array[Int] = _ }
  48. Philipp Haller Encapsulation 48 • Ensuring absence of data races

    requires restricting types put into boxes • Requirements for “safe” classes:* • Methods only access parameters and this • Method parameter types are “safe” • Methods only instantiate “safe” classes • Types of fields are “safe” “Safe” = conforms to object capability model [17] * simplified
  49. Philipp Haller Object capabilities in Scala 49 • How common

    is object-capability safe code in Scala? • Empirical study of over 75,000 SLOC of open-source Scala code: Project Version SLOC GitHub stats Scala stdlib 2.11.7 33,107 ✭5,795 257 Signal/Collect 8.0.6 10,159 ✭123 11 GeoTrellis 0.10.0-RC2 35,351 ✭400 38 -engine 3,868 -raster 22,291 -spark 9,192
  50. Philipp Haller Object capabilities in Scala 50 Results of empirical

    study: Project #classes/traits #ocap (%) #dir. insec. (%) Scala stdlib 1,505 644 (43%) 212/861 (25%) Signal/Collect 236 159 (67%) 60/77 (78%) GeoTrellis -engine 190 40 (21%) 124/150 (83%) -raster 670 233 (35%) 325/437 (74%) -spark 326 101 (31%) 167/225 (74%) Total 2,927 1,177 (40%) 888/1,750 (51%) Immutability inference increases these percentages!
  51. Philipp Haller Ongoing work 51 • Flow-sensitive type checking •

    "Don't indent when consuming permission" • Empirical studies • How much effort to change existing code? • Language support for immutable types [18] • Complete mechanization in Coq proof assistant
  52. Philipp Haller Conclusion • Scala enables powerful libraries for reactive

    programming • Akka actors representative example • There are many others: Akka Streams, Spark Streaming, REScala [19] etc. • Not all concurrency hazards can be prevented by Scala's current type system. • In ongoing research projects, such as LaCasa and Reactive Async [20], we are exploring ways to rule out data races and non-determinism 52
  53. Philipp Haller References (1) • [1]: Gérard Berry, 1989. http://www-sop.inria.fr/members/Gerard.Berry/Papers/Berry-

    IFIP-89.pdf • [2]: https://www.reactivemanifesto.org/ • [3]: http://store.steampowered.com/stats/content/ • [4]: https://www.itbusinessedge.com/cm/blogs/lawson/the-big-data-software-problem- behind-cerns-higgs-boson-hunt/?cs=50736 • [5]: https://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/ • [6]: https://blog.twitter.com/2009/inauguration-day-twitter • [7]: http://www.erlang-factory.com/upload/presentations/45/keynote_joearmstrong.pdf • [8]: http://static.usenix.org/publications/library/proceedings/usenix02/full_papers/ adyahowell/adyahowell_html/ • [9]: https://www.youtube.com/watch?v=uKfKtXYLG78 • [10]: Liskov, 1988. Distributed programming in Argus. Communications of the ACM, 31(3), pp.300-312. https://dl.acm.org/citation.cfm?id=42399 53
  54. Philipp Haller References (2) • [11]: Fournet and Gonthier, 1996.

    The reflexive CHAM and the join-calculus. Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (pp. 372-385).
 https://dl.acm.org/citation.cfm?id=237805 • [12]: Hewitt, Bishop, and Steiger, 1973. A universal modular actor formalism for artificial intelligence. Proc. IJCAI. See also https://eighty-twenty.org/2016/10/18/actors-hopl • [13]: Guy Steele, 1998. "Growing a Language". OOPSLA keynote.
 https://www.youtube.com/watch?v=_ahvzDzKdB0 • [14]: Haller and Odersky, 2007. Actors that unify threads and events. In International Conference on Coordination Languages and Models (pp. 171-190). Springer, Berlin, Heidelberg.
 https://link.springer.com/chapter/10.1007/978-3-540-72794-1_10 • [15]: https://github.com/phaller/lacasa • [16]: Miller, Haller, and Odersky, 2014. Spores: A type-based foundation for closures in the age of concurrency and distribution. Proc. ECOOP
 https://github.com/scalacenter/spores • [17]: Mark S. Miller, 2006. Robust Composition: Towards a Unified Approach to Access Control and Concurrency Control. PhD thesis • [18]: https://www.youtube.com/watch?v=IiCt4nZfQfg • [19]: http://guidosalva.github.io/REScala/ • [20]: https://github.com/phaller/reactive-async 54