Akka Streams. Introduction (Zalando version)

A talk about Akka Streams given on SPb Scala meetup (https://www.meetup.com/ScalaSpb/events/243298982/) on 04.10.2017.

Ivan Yurchenko

October 05, 2017

  1. 2 • Ivan Yurchenko. • Currently work at Zalando in

    Helsinki. • Have been working in several teams: mobile backend, search, domain knowledge service. • Mostly use Scala now. • Contacts: ◦ [email protected] ◦ https://ivanyu.me/ ◦ https://linkedin.com/in/ivanyurchenko/ ◦ https://twitter.com/ivan0yu ABOUT ME
  4. 5 MOTIVATION • Often data processing is a pipeline of

    stages • Might be complex, with asynchronous stages of different speed, I/O, complex in topology (merges, broadcasts, etc.) • This implies buffering, queues, congestion control, etc. and might be difficult • Actor systems are technically good for this, but quite low-level => bug-prone and lots of boilerplate • High-level programming libraries (Rx*), frameworks (Apache Camel) and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist
  5. 6 AKKA STREAMS • A way to build arbitrary complex

    type-safe data processing pipelines • Pipelines consist of stages • Stages are composable and reusable • Stages might be complex, consist of smaller sub-pipelines • Stages can be executed asynchronously (in different ExecutionContexts) • Not distributed [yet] • New: compatible with Java 9’s java.util.concurrent.Flow
  6. 7 AKKA STREAMS BASICS In general: data processing is passing

    data through arbitrary complex graph of transformations/actions Most common: Source → Flow → … → Flow → Sink
  7. 8 AKKA STREAMS BASICS val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world")

    .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val helloWorldStream2: RunnableGraph[NotUsed] = Source.single("Hello world") .map(s => s.toUpperCase()) .to(Sink.foreach(println)) ←1 ←2 ←3 ←5 ↙ 4
  8. 9 MATERIALIZATION Materializer -- ActorMaterializer implicit val actorSystem = ActorSystem("akka-streams-example")

    implicit val materializer = ActorMaterializer() helloWorldStream.run() HELLO WORLD interface implementation
  9. 10 LOTS OF STAGES OUT OF THE BOX Source: fromIterator,

    single, repeat, cycle, tick, fromFuture, unfold, empty, failed, actorPublisher, actorRef, queue, fromPath, ... Sink: head, headOption, last, lastOption, ignore, cancelled, seq, foreach, foreachParallel, queue, fold, reduce, actorRef, actorRefWithAck, actorSubscriber, toPath, ... Flow: map, mapAsync, mapConcant, statefulMapConcat, filter, grouped, sliding, scan, scanAsync, fold, foldAsync, take, takeWhile, drop, dropWhile, recover, recoverWith, throttle, intersperse, limit, delay, buffer, monitor, ...
  10. 12 MATERIALIZED VALUES • It’s something that we get when

    a stream is materialized by Materializer • Not the result of a stream (a stream might even not have a result as such) • Each stage creates its own materialized value • It’s up to us which one we want to have at the end
  11. 13 MATERIALIZED VALUES NotUsed – materialized value, but not really

    useful val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val materializedValue: NotUsed = helloWorldStream1.run() Future[Done] – much more useful val helloWorldStream2: RunnableGraph[Future[Done]] = Source.single("Hello world") .map(s => s.toUpperCase()) .toMat(Sink.foreach(println))(Keep.right) val doneF: Future[Done] = helloWorldStream2.run() doneF.onComplete { … } ←1 ←2 ←3 ←4
  12. 15 KILL SWITCHES val stream: RunnableGraph[(UniqueKillSwitch, Future[Done])] = Source.single("Hello world")

    .map(s => s.toUpperCase()) .viaMat(KillSwitches.single)(Keep.right) .toMat(Sink.foreach(println))(Keep.both) val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) = stream.run() killSwitch.shutdown() // or killSwitch.abort(new Exception("Exception from KillSwitch")) ←1 ←2 ←3 ←4 ←5
  13. 17 BACK PRESSURE • Different speeds of stages (produces/consumer) causes

    problems • We know how to deal with these problems • Back pressure – a mechanism for the consumer to signal to the producer about capacity for incoming data

    single HTTP GET request • Receive an infinite HTTP response • One line = one event batch – need to parse • Process batches

    Http(actorSystem) val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443) val getRequest = HttpRequest(HttpMethods.GET, "/") val eventBatchSource: Source[EventBatch, NotUsed] = // The stream start with a single request object ... Source.single(getRequest) // ... that goes through a connection (i.e. is sent to the server) .via(nakadiConnectionFlow) .flatMapConcat { case response @ HttpResponse(StatusCodes.OK, _, _, _) => response.entity.dataBytes // Decompress deflate-compressed bytes. .via(Deflate.decoderFlow) // Coalesce chunks into a line. .via(Framing.delimiter(ByteString("\n"), Int.MaxValue)) // Deserialize JSON. .map(bs => Json.read[EventBatch](bs.utf8String)) // process erroneous responses } eventBatchSource.map(...).to(...) // process batches
  16. 22 GraphDSL import akka.stream.scaladsl.GraphDSL.Implicits._ RunnableGraph.fromGraph(GraphDSL.create() { implicit builder => val

    A: Outlet[Int] = builder.add(Source.single(0)).out val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2)) val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1)) val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2)) val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val G: Inlet[Any] = builder.add(Sink.foreach(println)).in C <~ F A ~> B ~> C ~> F B ~> D ~> E ~> F E ~> G ClosedShape })
  17. 23 class LongCounter extends ActorPublisher[Long] { private var counter =

    0L override def receive: Receive = { case ActorPublisherMessage.Request(n) => for (_ <- 0 to n) { counter += 1 onNext(counter) } case ActorPublisherMessage.Cancel => context.stop(self) } } INTEGRATION WITH AKKA ACTORS • An actor can be a Source or a Sink • The back pressure protocol – normal actor messages ←1 ←2 ←3
  18. 24 CONCLUSION • Akka Streams – a way to build

    arbitrary complex type-safe data processing pipelines • Complex inside, but the interface is reasonably simple • Gives control over execution, including back pressure and asynchronous execution • Don’t misuse it, might be not suitable for the task

    – HTTP client and server http://doc.akka.io/docs/akka-http/current/scala.html • Alpakka – enterprise integration patterns (like Apache Camel) (WIP) http://developer.lightbend.com/docs/alpakka/current/
