Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

Akka Streams. Introduction (Zalando version)

Akka Streams. Introduction (Zalando version)

A talk about Akka Streams given on SPb Scala meetup (https://www.meetup.com/ScalaSpb/events/243298982/) on 04.10.2017.

Ivan Yurchenko

October 05, 2017
Tweet

More Decks by Ivan Yurchenko

Other Decks in Programming

Transcript

  1. 2 • Ivan Yurchenko. • Currently work at Zalando in

    Helsinki. • Have been working in several teams: mobile backend, search, domain knowledge service. • Mostly use Scala now. • Contacts: ◦ [email protected] ◦ https://ivanyu.me/ ◦ https://linkedin.com/in/ivanyurchenko/ ◦ https://twitter.com/ivan0yu ABOUT ME
  2. 15 countries 21 million active customers 200 million visits per

    month ~3.64 billion € revenue 2016 13.000+ employees 100+ nationalities Tech HQ in Berlin 1800 employees in Tech AT A GLANCE: EUROPE’S LARGEST ONLINE FASHION RETAILER Visit us: jobs.zalando.com
  3. 4 ZALANDO HELSINKI TECH HUB Zalando Helsinki site was opened

    in August 2015, moved to new office in August 2016. BUILDING OUR ECOMMERCE PLATFORM AWS, Microservices, Scala, Android and iOS 108 employees Autonomous delivery teams working with modern technologies 12 31 Nationalities Our office is located in KAMPPI
  4. 5 MOTIVATION • Often data processing is a pipeline of

    stages • Might be complex, with asynchronous stages of different speed, I/O, complex in topology (merges, broadcasts, etc.) • This implies buffering, queues, congestion control, etc. and might be difficult • Actor systems are technically good for this, but quite low-level => bug-prone and lots of boilerplate • High-level programming libraries (Rx*), frameworks (Apache Camel) and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist
  5. 6 AKKA STREAMS • A way to build arbitrary complex

    type-safe data processing pipelines • Pipelines consist of stages • Stages are composable and reusable • Stages might be complex, consist of smaller sub-pipelines • Stages can be executed asynchronously (in different ExecutionContexts) • Not distributed [yet] • New: compatible with Java 9’s java.util.concurrent.Flow
  6. 7 AKKA STREAMS BASICS In general: data processing is passing

    data through arbitrary complex graph of transformations/actions Most common: Source → Flow → … → Flow → Sink
  7. 8 AKKA STREAMS BASICS val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world")

    .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val helloWorldStream2: RunnableGraph[NotUsed] = Source.single("Hello world") .map(s => s.toUpperCase()) .to(Sink.foreach(println)) ←1 ←2 ←3 ←5 ↙ 4
  8. 9 MATERIALIZATION Materializer -- ActorMaterializer implicit val actorSystem = ActorSystem("akka-streams-example")

    implicit val materializer = ActorMaterializer() helloWorldStream.run() HELLO WORLD interface implementation
  9. 10 LOTS OF STAGES OUT OF THE BOX Source: fromIterator,

    single, repeat, cycle, tick, fromFuture, unfold, empty, failed, actorPublisher, actorRef, queue, fromPath, ... Sink: head, headOption, last, lastOption, ignore, cancelled, seq, foreach, foreachParallel, queue, fold, reduce, actorRef, actorRefWithAck, actorSubscriber, toPath, ... Flow: map, mapAsync, mapConcant, statefulMapConcat, filter, grouped, sliding, scan, scanAsync, fold, foldAsync, take, takeWhile, drop, dropWhile, recover, recoverWith, throttle, intersperse, limit, delay, buffer, monitor, ...
  10. 12 MATERIALIZED VALUES • It’s something that we get when

    a stream is materialized by Materializer • Not the result of a stream (a stream might even not have a result as such) • Each stage creates its own materialized value • It’s up to us which one we want to have at the end
  11. 13 MATERIALIZED VALUES NotUsed – materialized value, but not really

    useful val helloWorldStream1: RunnableGraph[NotUsed] = Source.single("Hello world") .via(Flow[String].map(s => s.toUpperCase())) .to(Sink.foreach(println)) val materializedValue: NotUsed = helloWorldStream1.run() Future[Done] – much more useful val helloWorldStream2: RunnableGraph[Future[Done]] = Source.single("Hello world") .map(s => s.toUpperCase()) .toMat(Sink.foreach(println))(Keep.right) val doneF: Future[Done] = helloWorldStream2.run() doneF.onComplete { … } ←1 ←2 ←3 ←4
  12. 15 KILL SWITCHES val stream: RunnableGraph[(UniqueKillSwitch, Future[Done])] = Source.single("Hello world")

    .map(s => s.toUpperCase()) .viaMat(KillSwitches.single)(Keep.right) .toMat(Sink.foreach(println))(Keep.both) val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) = stream.run() killSwitch.shutdown() // or killSwitch.abort(new Exception("Exception from KillSwitch")) ←1 ←2 ←3 ←4 ←5
  13. 17 BACK PRESSURE • Different speeds of stages (produces/consumer) causes

    problems • We know how to deal with these problems • Back pressure – a mechanism for the consumer to signal to the producer about capacity for incoming data
  14. 19 PRACTICAL EXAMPLE – CONSUMING FROM NAKADI • Send a

    single HTTP GET request • Receive an infinite HTTP response • One line = one event batch – need to parse • Process batches
  15. 20 PRACTICAL EXAMPLE – CONSUMING FROM NAKADI val http =

    Http(actorSystem) val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443) val getRequest = HttpRequest(HttpMethods.GET, "/") val eventBatchSource: Source[EventBatch, NotUsed] = // The stream start with a single request object ... Source.single(getRequest) // ... that goes through a connection (i.e. is sent to the server) .via(nakadiConnectionFlow) .flatMapConcat { case response @ HttpResponse(StatusCodes.OK, _, _, _) => response.entity.dataBytes // Decompress deflate-compressed bytes. .via(Deflate.decoderFlow) // Coalesce chunks into a line. .via(Framing.delimiter(ByteString("\n"), Int.MaxValue)) // Deserialize JSON. .map(bs => Json.read[EventBatch](bs.utf8String)) // process erroneous responses } eventBatchSource.map(...).to(...) // process batches
  16. 22 GraphDSL import akka.stream.scaladsl.GraphDSL.Implicits._ RunnableGraph.fromGraph(GraphDSL.create() { implicit builder => val

    A: Outlet[Int] = builder.add(Source.single(0)).out val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2)) val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1)) val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2)) val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2)) val G: Inlet[Any] = builder.add(Sink.foreach(println)).in C <~ F A ~> B ~> C ~> F B ~> D ~> E ~> F E ~> G ClosedShape })
  17. 23 class LongCounter extends ActorPublisher[Long] { private var counter =

    0L override def receive: Receive = { case ActorPublisherMessage.Request(n) => for (_ <- 0 to n) { counter += 1 onNext(counter) } case ActorPublisherMessage.Cancel => context.stop(self) } } INTEGRATION WITH AKKA ACTORS • An actor can be a Source or a Sink • The back pressure protocol – normal actor messages ←1 ←2 ←3
  18. 24 CONCLUSION • Akka Streams – a way to build

    arbitrary complex type-safe data processing pipelines • Complex inside, but the interface is reasonably simple • Gives control over execution, including back pressure and asynchronous execution • Don’t misuse it, might be not suitable for the task
  19. 26 BUILT ON TOP OF AKKA STREAMS • Akka HTTP

    – HTTP client and server http://doc.akka.io/docs/akka-http/current/scala.html • Alpakka – enterprise integration patterns (like Apache Camel) (WIP) http://developer.lightbend.com/docs/alpakka/current/
  20. 29 More about Zalando? FOLLOW US: #Zelsinki #ZalandoTech LinkedIn: Zalando

    SE Facebook & Instagram: @insidezalando Twitter: @ZalandoTech Tech Blog: https://jobs.zalando.com/tech/blog/ CAREERS: https://jobs.zalando.com