Slide 1

Slide 1 text

Akka Streams Introduction Ivan Yurchenko @ivan0yu

Slide 2

Slide 2 text

● Often data processing is a pipeline of stages ● Might be complex, with asynchronous stages of different speed, I/O, complex in topology (merges, broadcasts, etc.) ● This implies buffering, queues, congestion control, etc. and might be difficult ● Actor systems are technically good for this, but quite low-level => bug-prone and lots of boilerplate ● High-level programming libraries (Rx*), frameworks (Apache Camel) and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist Motivation

Slide 3

Slide 3 text

Akka Streams ● A way to build arbitrary complex type-safe data processing pipelines ● Pipelines consist of stages ● Stages are composable and reusable ● Stages might be complex, consist of smaller sub-pipelines ● Stages can be executed asynchronously (in different ExecutionContexts) ● Not distributed [yet]

Slide 4

Slide 4 text

Akka Streams basics In general: data processing is passing data through arbitrary complex graph of transformations/actions Most common: Source → Flow → … → Flow → Sink

Slide 5

Slide 5 text

Akka Streams basics val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val helloWorldStream2: RunnableGraph[NotUsed] = Source.single("Hello world") .map(s => s.toUpperCase()) .to(Sink.foreach(println))

Slide 6

Slide 6 text

Materialization Materializer -- ActorMaterializer implicit val actorSystem = ActorSystem("akka-streams-example") implicit val materializer = ActorMaterializer() helloWorldStream.run() HELLO WORLD interface implementation

Slide 7

Slide 7 text

Lots of stages out of the box Source: fromIterator, single, repeat, cycle, tick, fromFuture, unfold, empty, failed, actorPublisher, actorRef, queue, fromPath, ... Sink: head, headOption, last, lastOption, ignore, cancelled, seq, foreach, foreachParallel, queue, fold, reduce, actorRef, actorRefWithAck, actorSubscriber, toPath, ... Flow: map, mapAsync, mapConcant, statefulMapConcat, filter, grouped, sliding, scan, scanAsync, fold, foldAsync, take, takeWhile, drop, dropWhile, recover, recoverWith, throttle, intersperse, limit, delay, buffer, monitor, ...

Slide 8

Slide 8 text

Composition and reusability

Slide 9

Slide 9 text

Materialized values ● It’s something that we get when a stream is materialized by Materializer ● Not the result of a stream (a stream might even not have a result as such) ● Each stage creates its own materialized value ● It’s up to us which one we want to have at the end

Slide 10

Slide 10 text

Materialized values NotUsed – materialized value, but not really useful val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val materializedValue: NotUsed = helloWorldStream1.run() Future[Done] – much more useful val helloWorldStream2: RunnableGraph[Future[Done]] = Source.single("Hello world") .map(s => s.toUpperCase()) .toMat(Sink.foreach(println))(Keep.right) val doneF: Future[Done] = helloWorldStream2.run() doneF.onComplete { … }

Slide 11

Slide 11 text

Materialized values in composition

Slide 12

Slide 12 text

Kill switches val stream: RunnableGraph[(UniqueKillSwitch, Future[Done])] = Source.single("Hello world") .map(s => s.toUpperCase()) .viaMat(KillSwitches.single)(Keep.right) .toMat(Sink.foreach(println))(Keep.both) val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) = stream.run() killSwitch.shutdown() // or killSwitch.abort(new Exception("Exception from KillSwitch"))

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

● Different speeds of stages (produces/consumer) causes problems ● We know how to deal with these problems ● Back pressure – a mechanism for the consumer to signal to the producer about capacity for incoming data Back pressure

Slide 15

Slide 15 text

Back pressure

Slide 16

Slide 16 text

Practical example – consuming from Nakadi ● Send a single HTTP GET request ● Receive an infinite HTTP response ● One line = one event batch – need to parse ● Process batches

Slide 17

Slide 17 text

Practical example – consuming from Nakadi val http = Http(actorSystem) val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443) val getRequest = HttpRequest(HttpMethods.GET, "/") val eventBatchSource: Source[EventBatch, NotUsed] = // The stream start with a single request object ... Source.single(getRequest) // ... that goes through a connection (i.e. is sent to the server) .via(nakadiConnectionFlow) .flatMapConcat { case response @ HttpResponse(StatusCodes.OK, _, _, _) => response.entity.dataBytes // Decompress deflate-compressed bytes. .via(Deflate.decoderFlow) // Coalesce chunks into a line. .via(Framing.delimiter(ByteString("\n"), Int.MaxValue)) // Deserialize JSON. .map(bs => Json.read[EventBatch](bs.utf8String)) // process erroneous responses } eventBatchSource.map(...).to(...) // process batches

Slide 18

Slide 18 text

GraphDSL

Slide 19

Slide 19 text

GraphDSL import akka.stream.scaladsl.GraphDSL.Implicits._ RunnableGraph.fromGraph(GraphDSL.create() { implicit builder => val A: Outlet[Int] = builder.add(Source.single(0)).out val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2)) val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1)) val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2)) val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val G: Inlet[Any] = builder.add(Sink.foreach(println)).in C <~ F A ~> B ~> C ~> F B ~> D ~> E ~> F E ~> G ClosedShape })

Slide 20

Slide 20 text

Integration with Akka actors ● An actor can be a Source or a Sink ● The back pressure protocol – normal actor messages class LongCounter extends ActorPublisher[Long] { private var counter = 0L override def receive: Receive = { case ActorPublisherMessage.Request(n) => for (_ <- 0 to n) { counter += 1 onNext(counter) } case ActorPublisherMessage.Cancel => context.stop(self) } }

Slide 21

Slide 21 text

Conclusion ● Akka Streams – a way to build arbitrary complex type-safe data processing pipelines ● Complex inside, but the interface is reasonably simple ● Gives control over execution, including back pressure and asynchronous execution ● Don’t misuse it, might be not suitable for the task

Slide 22

Slide 22 text

Where to get information ● The official documentation http://doc.akka.io/docs/akka/current/scala/stream/index.html ● Akka team blog http://blog.akka.io/

Slide 23

Slide 23 text

Built on top of Akka Streams ● Akka HTTP – HTTP client and server http://doc.akka.io/docs/akka-http/current/scala.html ● Alpakka – enterprise integration patterns (like Apache Camel) (WIP) http://developer.lightbend.com/docs/alpakka/current/

Slide 24

Slide 24 text

Blog post version of this presentation About Akka Streams https://tech.zalando.com/blog/about-akka-streams/

Slide 25

Slide 25 text

Questions?