$30 off During Our Annual Pro Sale. View Details »

The Joys of (Z)Streams

The Joys of (Z)Streams

Itamar Ravid

May 28, 2020
Tweet

More Decks by Itamar Ravid

Other Decks in Programming

Transcript

  1. The Joys of (Z)Streams
    Itamar Ravid - @iravid_ 2

    View Slide

  2. Itamar Ravid - @iravid_ 3

    View Slide

  3. Itamar Ravid - @iravid_ 4

    View Slide

  4. Itamar Ravid - @iravid_ 5

    View Slide

  5. Chunks to the rescue!
    Itamar Ravid - @iravid_ 6

    View Slide

  6. Itamar Ravid - @iravid_ 7

    View Slide

  7. ZIO's Chunk
    • Immutable, array-backed
    • Keeps primitives unboxed (!!!)
    • O(1) concatenation, O(1) slicing
    • Extremely fast single-element append
    Itamar Ravid - @iravid_ 8

    View Slide

  8. Streaming super-powers
    for (line FileUtils.readFileToString.split('\n'))
    println(line)
    Itamar Ravid - @iravid_ 9

    View Slide

  9. Streaming super-powers
    Itamar Ravid - @iravid_ 10

    View Slide

  10. Streaming super-powers
    With bounded memory:
    ZStream.fromFile(f le, chunkSize = 16384)
    .transduce(utf8Decode splitLines)
    .foreach(console.putStrLn)
    Itamar Ravid - @iravid_ 11

    View Slide

  11. Itamar Ravid - @iravid_ 12

    View Slide

  12. Itamar Ravid - @iravid_ 13

    View Slide

  13. Itamar Ravid - @iravid_ 14

    View Slide

  14. A stream a day
    keeps the doctor away
    Itamar Ravid - @iravid_ 15

    View Slide

  15. Parallelism and pipelining
    val numbers = 1 to 1000
    for {
    primes ZIO.foreachParN(numbers, 20)(isPrime)
    _ ZIO.foreachParN(primes, 20)(moreHardWork)
    } yield ()
    Itamar Ravid - @iravid_ 16

    View Slide

  16. Parallelism
    val numbers = 1 to 1000
    ZStream
    .fromIterable(numbers)
    .mapMPar(20)(isPrime)
    .mapMPar(20)(moreHardWork)
    Now we get pipelining!
    Itamar Ravid - @iravid_ 17

    View Slide

  17. Fibers and Queues
    It's very tempting to do this:
    for {
    input Queue.bounded(16)
    middle Queue.bounded(16)
    output Queue.bounded(16)
    _ writeToInput(input).fork
    _ processBetweenQueues(input, middle).fork
    _ processBetweenQueues(middle, output).fork
    _ printElements(inputQueue).fork
    } yield ()
    Itamar Ravid - @iravid_ 18

    View Slide

  18. Fibers and Queues
    ZStream
    .repeatEffect(generateElements)
    .buffer(16)
    .mapM(process)
    .buffer(16)
    .mapM(process)
    .buffer(16)
    .tap(printElement)
    Interruption safe, pipelined!
    Itamar Ravid - @iravid_ 19

    View Slide

  19. Itamar Ravid - @iravid_ 20

    View Slide

  20. Common patterns
    Itamar Ravid - @iravid_ 21

    View Slide

  21. Itamar Ravid - @iravid_ 22

    View Slide

  22. Multiple-callback hell
    Rabbit recommends its driver be used in push mode:
    val channel: Channel
    channel.basicConsume("queue", autoAck = false, "consumer tag",
    new DefaultConsumer(channel) {
    def handleDelivery(body: Array[Byte]) Unit = ???
    def handleShutdown() Unit = ???
    def handleCancel() Unit = ???
    }
    )
    Itamar Ravid - @iravid_ 23

    View Slide

  23. Multiple-callback hell
    Stuff our entire logic into the callback?
    channel.basicConsume("queue", autoAck = false, "consumer tag",
    new DefaultConsumer(channel) {
    def handleDelivery(body: Array[Byte]) Unit =
    deserialize(body) match {
    }
    }
    )
    Itamar Ravid - @iravid_ 24

    View Slide

  24. Multiple-callback hell
    Enqueue somewhere?
    channel.basicConsume("queue", autoAck = false, "consumer tag",
    new DefaultConsumer(channel) {
    def handleDelivery(body: Array[Byte]) Unit =
    queue.offer(body)
    }
    )
    Itamar Ravid - @iravid_ 25

    View Slide

  25. Escaping multiple callback hell
    The ZStream solution for this is ZStream.effectAsync:
    ZStream.effectAsync[Any, Throwable, Array[Byte]] { cb
    channel.basicConsume(
    new DefaultConsumer(channel) {
    def handleDelivery(body: Array[Byte]) Unit =
    cb(ZIO.succeed(body))
    def handleShutdown() Unit =
    cb(ZIO.fail(None))
    }
    )
    }
    Itamar Ravid - @iravid_ 26

    View Slide

  26. Escaping multiple callback hell
    From that call, we get a plain old:
    ZStream[Any, Throwable, Array[Byte]]
    Which we can freely compose as usual. No more callback
    contortions!
    Itamar Ravid - @iravid_ 27

    View Slide

  27. Escaping multiple callback hell
    effectAsyncInterrupt can specify how to cancel the
    subscription:
    ZStream.effectAsyncInterrupt[Any, Throwable, Array[Byte]] { cb
    val consumerTag = channel.basicConsume(new DefaultConsumer )
    Left(UIO(channel.basicCancel(consumerTag)))
    }
    Itamar Ravid - @iravid_ 28

    View Slide

  28. Adaptive batching
    Databases love batches. And with streams, we can easily batch
    things up for consumption in a database:
    val dataStream: Stream[Throwable, Record]
    dataStream
    .groupedWithin(2000, 30.seconds) Stream[Throwable, List[Record]]
    .mapM(insertRows)
    Itamar Ravid - @iravid_ 29

    View Slide

  29. Adaptive batching
    Databases love batches. And with streams, we can easily batch
    things up for consumption in a database:
    val dataStream: Stream[Throwable, Record]
    dataStream
    .aggregateAsyncWithin(
    ZTransducer.collectAllN(2000),
    Schedule.f xed(30.seconds)
    )
    .mapM(insertRows)
    Itamar Ravid - @iravid_ 30

    View Slide

  30. Adaptive batching
    This approach favours throughput over latency.
    What if we want to balance the two?
    Itamar Ravid - @iravid_ 31

    View Slide

  31. Adaptive batching
    We can ship things off as long as the path is clear:
    val dataStream: Stream[Throwable, Record]
    dataStream
    .aggregateAsync(ZTransducer.collectAllN(2000))
    .mapM(insertRows)
    Itamar Ravid - @iravid_ 32

    View Slide

  32. Adaptive batching
    Thanks to Schedule, we can construct a sophisticated batch
    schedule:
    val schedule: Schedule[Clock, List[Record], Unit] =
    Start off with 30 second timeouts as long as
    batch size is < 1000
    ZSchedule.f xed(30.seconds).whileInput(_.size < 1000)
    andThen and then, switch to a shorter, jittered schedule
    ZSchedule.f xed(5.seconds).jittered
    for as long as batches remain over 1000
    .whileInput(_.size 1000)
    Itamar Ravid - @iravid_ 33

    View Slide

  33. Adaptive batching
    Thanks to Schedule, we can construct a sophisticated batch
    schedule:
    val schedule: Schedule[Clock, List[Record], Unit] =
    ZSchedule.f xed(30.seconds).whileInput(_.size < 1000) andThen
    ZSchedule.f xed(5.seconds).jittered.whileInput(_.size 1000)
    dataStream
    .aggregateAsyncWithin(collectAllN(2000), schedule)
    .mapM(insertRows)
    Itamar Ravid - @iravid_ 34

    View Slide

  34. Application composition
    Long-running applications are often composed of multiple
    components:
    val kafkaStream: ZStream[Blocking, Throwable, Record]
    val httpServer: Task[Nothing]
    val scheduledJobRunner: Task[Nothing]
    Itamar Ravid - @iravid_ 35

    View Slide

  35. Application composition
    What should our main fiber do with all of these?
    • Launch and wait so our app doesn't exit prematurely
    • Interrupt everything when SIGTERM is received
    • Watch all the fibers and quickly exit if something goes wrong
    Itamar Ravid - @iravid_ 36

    View Slide

  36. Application composition
    Here's an approach that doesn't work:
    val main =
    kafkaConsumer.runDrain.fork
    httpServer.fork
    scheduledJobRunner.fork
    ZIO.never
    Itamar Ravid - @iravid_ 37

    View Slide

  37. Application composition
    So let's watch the fibers:
    val managedApp =
    for {
    kafka kafkaConsumer.runDrain.forkManaged
    http httpServer.forkManaged
    jobs scheduledJobRunner.forkManaged
    } yield ZIO.raceAll(kafka.await, List(http.await, jobs.await))
    val main = managedApp
    .use(identity)
    . atMap(ZIO.done(_))
    Itamar Ravid - @iravid_ 38

    View Slide

  38. Itamar Ravid - @iravid_ 39

    View Slide

  39. Application composition
    val managedApp =
    for {
    _ other resources
    _ ZStream.mergeAllUnbounded(
    kafkaConsumer.drain,
    ZStream.fromEffect(httpServer),
    ZStream.fromEffect(scheduledJobRunner)
    ).runDrain.toManaged_
    } yield ()
    val main = managedApp.use_(ZIO.unit).exitCode
    Itamar Ravid - @iravid_ 40

    View Slide

  40. Want to learn more?
    Stream Processing with Scala
    15-18.6, remote only
    https://bit.ly/2AaEMeB
    Itamar Ravid - @iravid_ 41

    View Slide

  41. Thank you!
    Itamar Ravid - @iravid_ 42

    View Slide