Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Akka Streams. Introduction

Akka Streams. Introduction

A talk about Akka Streams given in Zalando Finland on 09.03.2017.

Avatar for Ivan Yurchenko

Ivan Yurchenko

March 09, 2017
Tweet

More Decks by Ivan Yurchenko

Other Decks in Programming

Transcript

  1. • Often data processing is a pipeline of stages •

    Might be complex, with asynchronous stages of different speed, I/O, complex in topology (merges, broadcasts, etc.) • This implies buffering, queues, congestion control, etc. and might be difficult • Actor systems are technically good for this, but quite low-level => bug-prone and lots of boilerplate • High-level programming libraries (Rx*), frameworks (Apache Camel) and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist Motivation
  2. Akka Streams • A way to build arbitrary complex type-safe

    data processing pipelines • Pipelines consist of stages • Stages are composable and reusable • Stages might be complex, consist of smaller sub-pipelines • Stages can be executed asynchronously (in different ExecutionContexts) • Not distributed [yet]
  3. Akka Streams basics In general: data processing is passing data

    through arbitrary complex graph of transformations/actions Most common: Source → Flow → … → Flow → Sink
  4. Akka Streams basics val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s

    => s.toUpperCase())) .to(Sink.foreach(println)) val helloWorldStream2: RunnableGraph[NotUsed] = Source.single("Hello world") .map(s => s.toUpperCase()) .to(Sink.foreach(println))
  5. Materialization Materializer -- ActorMaterializer implicit val actorSystem = ActorSystem("akka-streams-example") implicit

    val materializer = ActorMaterializer() helloWorldStream.run() HELLO WORLD interface implementation
  6. Lots of stages out of the box Source: fromIterator, single,

    repeat, cycle, tick, fromFuture, unfold, empty, failed, actorPublisher, actorRef, queue, fromPath, ... Sink: head, headOption, last, lastOption, ignore, cancelled, seq, foreach, foreachParallel, queue, fold, reduce, actorRef, actorRefWithAck, actorSubscriber, toPath, ... Flow: map, mapAsync, mapConcant, statefulMapConcat, filter, grouped, sliding, scan, scanAsync, fold, foldAsync, take, takeWhile, drop, dropWhile, recover, recoverWith, throttle, intersperse, limit, delay, buffer, monitor, ...
  7. Materialized values • It’s something that we get when a

    stream is materialized by Materializer • Not the result of a stream (a stream might even not have a result as such) • Each stage creates its own materialized value • It’s up to us which one we want to have at the end
  8. Materialized values NotUsed – materialized value, but not really useful

    val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val materializedValue: NotUsed = helloWorldStream1.run() Future[Done] – much more useful val helloWorldStream2: RunnableGraph[Future[Done]] = Source.single("Hello world") .map(s => s.toUpperCase()) .toMat(Sink.foreach(println))(Keep.right) val doneF: Future[Done] = helloWorldStream2.run() doneF.onComplete { … }
  9. Kill switches val stream: RunnableGraph[(UniqueKillSwitch, Future[Done])] = Source.single("Hello world") .map(s

    => s.toUpperCase()) .viaMat(KillSwitches.single)(Keep.right) .toMat(Sink.foreach(println))(Keep.both) val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) = stream.run() killSwitch.shutdown() // or killSwitch.abort(new Exception("Exception from KillSwitch"))
  10. • Different speeds of stages (produces/consumer) causes problems • We

    know how to deal with these problems • Back pressure – a mechanism for the consumer to signal to the producer about capacity for incoming data Back pressure
  11. Practical example – consuming from Nakadi • Send a single

    HTTP GET request • Receive an infinite HTTP response • One line = one event batch – need to parse • Process batches
  12. Practical example – consuming from Nakadi val http = Http(actorSystem)

    val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443) val getRequest = HttpRequest(HttpMethods.GET, "/") val eventBatchSource: Source[EventBatch, NotUsed] = // The stream start with a single request object ... Source.single(getRequest) // ... that goes through a connection (i.e. is sent to the server) .via(nakadiConnectionFlow) .flatMapConcat { case response @ HttpResponse(StatusCodes.OK, _, _, _) => response.entity.dataBytes // Decompress deflate-compressed bytes. .via(Deflate.decoderFlow) // Coalesce chunks into a line. .via(Framing.delimiter(ByteString("\n"), Int.MaxValue)) // Deserialize JSON. .map(bs => Json.read[EventBatch](bs.utf8String)) // process erroneous responses } eventBatchSource.map(...).to(...) // process batches
  13. GraphDSL import akka.stream.scaladsl.GraphDSL.Implicits._ RunnableGraph.fromGraph(GraphDSL.create() { implicit builder => val A:

    Outlet[Int] = builder.add(Source.single(0)).out val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2)) val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1)) val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2)) val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val G: Inlet[Any] = builder.add(Sink.foreach(println)).in C <~ F A ~> B ~> C ~> F B ~> D ~> E ~> F E ~> G ClosedShape })
  14. Integration with Akka actors • An actor can be a

    Source or a Sink • The back pressure protocol – normal actor messages class LongCounter extends ActorPublisher[Long] { private var counter = 0L override def receive: Receive = { case ActorPublisherMessage.Request(n) => for (_ <- 0 to n) { counter += 1 onNext(counter) } case ActorPublisherMessage.Cancel => context.stop(self) } }
  15. Conclusion • Akka Streams – a way to build arbitrary

    complex type-safe data processing pipelines • Complex inside, but the interface is reasonably simple • Gives control over execution, including back pressure and asynchronous execution • Don’t misuse it, might be not suitable for the task
  16. Built on top of Akka Streams • Akka HTTP –

    HTTP client and server http://doc.akka.io/docs/akka-http/current/scala.html • Alpakka – enterprise integration patterns (like Apache Camel) (WIP) http://developer.lightbend.com/docs/alpakka/current/