Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Akka Streams. Introduction

Akka Streams. Introduction

A talk about Akka Streams given in Zalando Finland on 09.03.2017.

Ivan Yurchenko

March 09, 2017

More Decks by Ivan Yurchenko

Other Decks in Programming


  1. • Often data processing is a pipeline of stages •

    Might be complex, with asynchronous stages of different speed, I/O, complex in topology (merges, broadcasts, etc.) • This implies buffering, queues, congestion control, etc. and might be difficult • Actor systems are technically good for this, but quite low-level => bug-prone and lots of boilerplate • High-level programming libraries (Rx*), frameworks (Apache Camel) and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist Motivation
  2. Akka Streams • A way to build arbitrary complex type-safe

    data processing pipelines • Pipelines consist of stages • Stages are composable and reusable • Stages might be complex, consist of smaller sub-pipelines • Stages can be executed asynchronously (in different ExecutionContexts) • Not distributed [yet]
  3. Akka Streams basics In general: data processing is passing data

    through arbitrary complex graph of transformations/actions Most common: Source → Flow → … → Flow → Sink
  4. Akka Streams basics val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s

    => s.toUpperCase())) .to(Sink.foreach(println)) val helloWorldStream2: RunnableGraph[NotUsed] = Source.single("Hello world") .map(s => s.toUpperCase()) .to(Sink.foreach(println))
  5. Materialization Materializer -- ActorMaterializer implicit val actorSystem = ActorSystem("akka-streams-example") implicit

    val materializer = ActorMaterializer() helloWorldStream.run() HELLO WORLD interface implementation
  6. Lots of stages out of the box Source: fromIterator, single,

    repeat, cycle, tick, fromFuture, unfold, empty, failed, actorPublisher, actorRef, queue, fromPath, ... Sink: head, headOption, last, lastOption, ignore, cancelled, seq, foreach, foreachParallel, queue, fold, reduce, actorRef, actorRefWithAck, actorSubscriber, toPath, ... Flow: map, mapAsync, mapConcant, statefulMapConcat, filter, grouped, sliding, scan, scanAsync, fold, foldAsync, take, takeWhile, drop, dropWhile, recover, recoverWith, throttle, intersperse, limit, delay, buffer, monitor, ...
  7. Materialized values • It’s something that we get when a

    stream is materialized by Materializer • Not the result of a stream (a stream might even not have a result as such) • Each stage creates its own materialized value • It’s up to us which one we want to have at the end
  8. Materialized values NotUsed – materialized value, but not really useful

    val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val materializedValue: NotUsed = helloWorldStream1.run() Future[Done] – much more useful val helloWorldStream2: RunnableGraph[Future[Done]] = Source.single("Hello world") .map(s => s.toUpperCase()) .toMat(Sink.foreach(println))(Keep.right) val doneF: Future[Done] = helloWorldStream2.run() doneF.onComplete { … }
  9. Kill switches val stream: RunnableGraph[(UniqueKillSwitch, Future[Done])] = Source.single("Hello world") .map(s

    => s.toUpperCase()) .viaMat(KillSwitches.single)(Keep.right) .toMat(Sink.foreach(println))(Keep.both) val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) = stream.run() killSwitch.shutdown() // or killSwitch.abort(new Exception("Exception from KillSwitch"))
  10. • Different speeds of stages (produces/consumer) causes problems • We

    know how to deal with these problems • Back pressure – a mechanism for the consumer to signal to the producer about capacity for incoming data Back pressure
  11. Practical example – consuming from Nakadi • Send a single

    HTTP GET request • Receive an infinite HTTP response • One line = one event batch – need to parse • Process batches
  12. Practical example – consuming from Nakadi val http = Http(actorSystem)

    val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443) val getRequest = HttpRequest(HttpMethods.GET, "/") val eventBatchSource: Source[EventBatch, NotUsed] = // The stream start with a single request object ... Source.single(getRequest) // ... that goes through a connection (i.e. is sent to the server) .via(nakadiConnectionFlow) .flatMapConcat { case response @ HttpResponse(StatusCodes.OK, _, _, _) => response.entity.dataBytes // Decompress deflate-compressed bytes. .via(Deflate.decoderFlow) // Coalesce chunks into a line. .via(Framing.delimiter(ByteString("\n"), Int.MaxValue)) // Deserialize JSON. .map(bs => Json.read[EventBatch](bs.utf8String)) // process erroneous responses } eventBatchSource.map(...).to(...) // process batches
  13. GraphDSL import akka.stream.scaladsl.GraphDSL.Implicits._ RunnableGraph.fromGraph(GraphDSL.create() { implicit builder => val A:

    Outlet[Int] = builder.add(Source.single(0)).out val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2)) val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1)) val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2)) val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val G: Inlet[Any] = builder.add(Sink.foreach(println)).in C <~ F A ~> B ~> C ~> F B ~> D ~> E ~> F E ~> G ClosedShape })
  14. Integration with Akka actors • An actor can be a

    Source or a Sink • The back pressure protocol – normal actor messages class LongCounter extends ActorPublisher[Long] { private var counter = 0L override def receive: Receive = { case ActorPublisherMessage.Request(n) => for (_ <- 0 to n) { counter += 1 onNext(counter) } case ActorPublisherMessage.Cancel => context.stop(self) } }
  15. Conclusion • Akka Streams – a way to build arbitrary

    complex type-safe data processing pipelines • Complex inside, but the interface is reasonably simple • Gives control over execution, including back pressure and asynchronous execution • Don’t misuse it, might be not suitable for the task
  16. Built on top of Akka Streams • Akka HTTP –

    HTTP client and server http://doc.akka.io/docs/akka-http/current/scala.html • Alpakka – enterprise integration patterns (like Apache Camel) (WIP) http://developer.lightbend.com/docs/alpakka/current/