Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Akka Streams. Introduction (Zalando version)

Akka Streams. Introduction (Zalando version)

A talk about Akka Streams given on SPb Scala meetup (https://www.meetup.com/ScalaSpb/events/243298982/) on 04.10.2017.

Ivan Yurchenko

October 05, 2017
Tweet

More Decks by Ivan Yurchenko

Other Decks in Programming

Transcript

  1. Akka
    Streams
    Introduction
    Ivan Yurchenko

    View Slide

  2. 2
    ● Ivan Yurchenko.
    ● Currently work at Zalando in Helsinki.
    ● Have been working in several teams: mobile backend,
    search, domain knowledge service.
    ● Mostly use Scala now.
    ● Contacts:
    [email protected]
    ○ https://ivanyu.me/
    ○ https://linkedin.com/in/ivanyurchenko/
    ○ https://twitter.com/ivan0yu
    ABOUT ME

    View Slide

  3. 15 countries
    21 million active customers
    200 million visits per month
    ~3.64 billion € revenue 2016
    13.000+ employees
    100+ nationalities
    Tech HQ in Berlin
    1800 employees in Tech
    AT A GLANCE: EUROPE’S LARGEST ONLINE FASHION RETAILER
    Visit us: jobs.zalando.com

    View Slide

  4. 4
    ZALANDO HELSINKI TECH HUB
    Zalando Helsinki site was opened in August 2015, moved to new office in August 2016.
    BUILDING OUR
    ECOMMERCE
    PLATFORM
    AWS, Microservices, Scala,
    Android and iOS
    108
    employees
    Autonomous delivery
    teams working with
    modern technologies
    12
    31
    Nationalities
    Our office is located in
    KAMPPI

    View Slide

  5. 5
    MOTIVATION
    ● Often data processing is a pipeline of stages
    ● Might be complex, with asynchronous stages of different speed, I/O,
    complex in topology (merges, broadcasts, etc.)
    ● This implies buffering, queues, congestion control, etc. and might be
    difficult
    ● Actor systems are technically good for this, but quite low-level =>
    bug-prone and lots of boilerplate
    ● High-level programming libraries (Rx*), frameworks (Apache Camel)
    and systems (Apache Storm, Twitter Heron, Apache Flink, etc.) exist

    View Slide

  6. 6
    AKKA STREAMS
    ● A way to build arbitrary complex type-safe data processing pipelines
    ● Pipelines consist of stages
    ● Stages are composable and reusable
    ● Stages might be complex, consist of smaller sub-pipelines
    ● Stages can be executed asynchronously (in different
    ExecutionContexts)
    ● Not distributed [yet]
    ● New: compatible with Java 9’s java.util.concurrent.Flow

    View Slide

  7. 7
    AKKA STREAMS BASICS
    In general: data processing is passing data through arbitrary
    complex graph of transformations/actions
    Most common:
    Source → Flow → … → Flow → Sink

    View Slide

  8. 8
    AKKA STREAMS BASICS
    val helloWorldStream1: RunnableGraph[NotUsed] =
    Source.single("Hello world")
    .via(Flow[String].map(s => s.toUpperCase()))
    .to(Sink.foreach(println))
    val helloWorldStream2: RunnableGraph[NotUsed] =
    Source.single("Hello world")
    .map(s => s.toUpperCase())
    .to(Sink.foreach(println))
    ←1
    ←2
    ←3
    ←5

    4

    View Slide

  9. 9
    MATERIALIZATION
    Materializer -- ActorMaterializer
    implicit val actorSystem = ActorSystem("akka-streams-example")
    implicit val materializer = ActorMaterializer()
    helloWorldStream.run()
    HELLO WORLD
    interface implementation

    View Slide

  10. 10
    LOTS OF STAGES OUT OF THE BOX
    Source: fromIterator, single, repeat, cycle, tick, fromFuture,
    unfold, empty, failed, actorPublisher, actorRef, queue,
    fromPath, ...
    Sink: head, headOption, last, lastOption, ignore, cancelled, seq,
    foreach, foreachParallel, queue, fold, reduce, actorRef,
    actorRefWithAck, actorSubscriber, toPath, ...
    Flow: map, mapAsync, mapConcant, statefulMapConcat, filter,
    grouped, sliding, scan, scanAsync, fold, foldAsync, take,
    takeWhile, drop, dropWhile, recover, recoverWith, throttle,
    intersperse, limit, delay, buffer, monitor, ...

    View Slide

  11. 11
    COMPOSITION AND REUSABILITY

    View Slide

  12. 12
    MATERIALIZED VALUES
    ● It’s something that we get when a stream is materialized
    by Materializer
    ● Not the result of a stream (a stream might even not have a
    result as such)
    ● Each stage creates its own materialized value
    ● It’s up to us which one we want to have at the end

    View Slide

  13. 13
    MATERIALIZED VALUES
    NotUsed – materialized value, but not really useful
    val helloWorldStream1: RunnableGraph[NotUsed] =
    Source.single("Hello world")
    .via(Flow[String].map(s => s.toUpperCase()))
    .to(Sink.foreach(println))
    val materializedValue: NotUsed = helloWorldStream1.run()
    Future[Done] – much more useful
    val helloWorldStream2: RunnableGraph[Future[Done]] =
    Source.single("Hello world")
    .map(s => s.toUpperCase())
    .toMat(Sink.foreach(println))(Keep.right)
    val doneF: Future[Done] = helloWorldStream2.run()
    doneF.onComplete { … }
    ←1
    ←2
    ←3
    ←4

    View Slide

  14. 14
    MATERIALIZED VALUES IN COMPOSITION

    View Slide

  15. 15
    KILL SWITCHES
    val stream:
    RunnableGraph[(UniqueKillSwitch, Future[Done])] =
    Source.single("Hello world")
    .map(s => s.toUpperCase())
    .viaMat(KillSwitches.single)(Keep.right)
    .toMat(Sink.foreach(println))(Keep.both)
    val (killSwitch, doneF): (UniqueKillSwitch,Future[Done]) =
    stream.run()
    killSwitch.shutdown()
    // or
    killSwitch.abort(new Exception("Exception from KillSwitch"))
    ←1
    ←2
    ←3
    ←4
    ←5

    View Slide

  16. View Slide

  17. 17
    BACK PRESSURE
    ● Different speeds of stages (produces/consumer) causes
    problems
    ● We know how to deal with these problems
    ● Back pressure – a mechanism for the consumer to signal
    to the producer about capacity for incoming data

    View Slide

  18. 18
    BACK PRESSURE

    View Slide

  19. 19
    PRACTICAL EXAMPLE – CONSUMING FROM
    NAKADI
    ● Send a single HTTP GET request
    ● Receive an infinite HTTP response
    ● One line = one event batch – need to parse
    ● Process batches

    View Slide

  20. 20
    PRACTICAL EXAMPLE – CONSUMING FROM
    NAKADI
    val http = Http(actorSystem)
    val nakadiConnectionFlow = http.outgoingConnectionHttps("https://nakadi-url.com", 443)
    val getRequest = HttpRequest(HttpMethods.GET, "/")
    val eventBatchSource: Source[EventBatch, NotUsed] =
    // The stream start with a single request object ...
    Source.single(getRequest)
    // ... that goes through a connection (i.e. is sent to the server)
    .via(nakadiConnectionFlow)
    .flatMapConcat {
    case response @ HttpResponse(StatusCodes.OK, _, _, _) =>
    response.entity.dataBytes
    // Decompress deflate-compressed bytes.
    .via(Deflate.decoderFlow)
    // Coalesce chunks into a line.
    .via(Framing.delimiter(ByteString("\n"), Int.MaxValue))
    // Deserialize JSON.
    .map(bs => Json.read[EventBatch](bs.utf8String))
    // process erroneous responses
    }
    eventBatchSource.map(...).to(...) // process batches

    View Slide

  21. 21
    GraphDSL

    View Slide

  22. 22
    GraphDSL
    import akka.stream.scaladsl.GraphDSL.Implicits._
    RunnableGraph.fromGraph(GraphDSL.create() { implicit builder =>
    val A: Outlet[Int] = builder.add(Source.single(0)).out
    val B: UniformFanOutShape[Int, Int] = builder.add(Broadcast[Int](2))
    val C: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2))
    val D: FlowShape[Int, Int] = builder.add(Flow[Int].map(_ + 1))
    val E: UniformFanOutShape[Int, Int] = builder.add(Balance[Int](2))
    val F: UniformFanInShape[Int, Int] = builder.add(Merge[Int](2))
    val G: Inlet[Any] = builder.add(Sink.foreach(println)).in
    C <~ F
    A ~> B ~> C ~> F
    B ~> D ~> E ~> F
    E ~> G
    ClosedShape
    })

    View Slide

  23. 23
    class LongCounter extends ActorPublisher[Long] {
    private var counter = 0L
    override def receive: Receive = {
    case ActorPublisherMessage.Request(n) =>
    for (_ <- 0 to n) {
    counter += 1
    onNext(counter)
    }
    case ActorPublisherMessage.Cancel =>
    context.stop(self)
    }
    }
    INTEGRATION WITH AKKA ACTORS
    ● An actor can be a Source or a Sink
    ● The back pressure protocol – normal actor messages
    ←1
    ←2
    ←3

    View Slide

  24. 24
    CONCLUSION
    ● Akka Streams – a way to build arbitrary complex type-safe data
    processing pipelines
    ● Complex inside, but the interface is reasonably simple
    ● Gives control over execution, including back pressure and
    asynchronous execution
    ● Don’t misuse it, might be not suitable for the task

    View Slide

  25. 25
    WHERE TO GET INFORMATION
    ● The official documentation
    http://doc.akka.io/docs/akka/current/scala/stream/index.html
    ● Akka team blog
    http://blog.akka.io/

    View Slide

  26. 26
    BUILT ON TOP OF AKKA STREAMS
    ● Akka HTTP – HTTP client and server
    http://doc.akka.io/docs/akka-http/current/scala.html
    ● Alpakka – enterprise integration patterns (like Apache Camel) (WIP)
    http://developer.lightbend.com/docs/alpakka/current/

    View Slide

  27. 27
    BLOG POST VERSION OF THIS
    PRESENTATION
    My blog: https://ivanyu.me/blog/2016/12/12/about-akka-streams/
    Zalando blog: https://tech.zalando.com/blog/about-akka-streams/

    View Slide

  28. QUESTIONS?

    View Slide

  29. 29
    More about Zalando?
    FOLLOW US: #Zelsinki #ZalandoTech
    LinkedIn: Zalando SE
    Facebook & Instagram: @insidezalando
    Twitter: @ZalandoTech
    Tech Blog: https://jobs.zalando.com/tech/blog/
    CAREERS: https://jobs.zalando.com

    View Slide