Slide 1

Slide 1 text

The Joys of (Z)Streams Itamar Ravid - @iravid_ 2

Slide 2

Slide 2 text

Itamar Ravid - @iravid_ 3

Slide 3

Slide 3 text

Itamar Ravid - @iravid_ 4

Slide 4

Slide 4 text

Itamar Ravid - @iravid_ 5

Slide 5

Slide 5 text

Chunks to the rescue! Itamar Ravid - @iravid_ 6

Slide 6

Slide 6 text

Itamar Ravid - @iravid_ 7

Slide 7

Slide 7 text

ZIO's Chunk • Immutable, array-backed • Keeps primitives unboxed (!!!) • O(1) concatenation, O(1) slicing • Extremely fast single-element append Itamar Ravid - @iravid_ 8

Slide 8

Slide 8 text

Streaming super-powers for (line FileUtils.readFileToString.split('\n')) println(line) Itamar Ravid - @iravid_ 9

Slide 9

Slide 9 text

Streaming super-powers Itamar Ravid - @iravid_ 10

Slide 10

Slide 10 text

Streaming super-powers With bounded memory: ZStream.fromFile(f le, chunkSize = 16384) .transduce(utf8Decode splitLines) .foreach(console.putStrLn) Itamar Ravid - @iravid_ 11

Slide 11

Slide 11 text

Itamar Ravid - @iravid_ 12

Slide 12

Slide 12 text

Itamar Ravid - @iravid_ 13

Slide 13

Slide 13 text

Itamar Ravid - @iravid_ 14

Slide 14

Slide 14 text

A stream a day keeps the doctor away Itamar Ravid - @iravid_ 15

Slide 15

Slide 15 text

Parallelism and pipelining val numbers = 1 to 1000 for { primes ZIO.foreachParN(numbers, 20)(isPrime) _ ZIO.foreachParN(primes, 20)(moreHardWork) } yield () Itamar Ravid - @iravid_ 16

Slide 16

Slide 16 text

Parallelism val numbers = 1 to 1000 ZStream .fromIterable(numbers) .mapMPar(20)(isPrime) .mapMPar(20)(moreHardWork) Now we get pipelining! Itamar Ravid - @iravid_ 17

Slide 17

Slide 17 text

Fibers and Queues It's very tempting to do this: for { input Queue.bounded(16) middle Queue.bounded(16) output Queue.bounded(16) _ writeToInput(input).fork _ processBetweenQueues(input, middle).fork _ processBetweenQueues(middle, output).fork _ printElements(inputQueue).fork } yield () Itamar Ravid - @iravid_ 18

Slide 18

Slide 18 text

Fibers and Queues ZStream .repeatEffect(generateElements) .buffer(16) .mapM(process) .buffer(16) .mapM(process) .buffer(16) .tap(printElement) Interruption safe, pipelined! Itamar Ravid - @iravid_ 19

Slide 19

Slide 19 text

Itamar Ravid - @iravid_ 20

Slide 20

Slide 20 text

Common patterns Itamar Ravid - @iravid_ 21

Slide 21

Slide 21 text

Itamar Ravid - @iravid_ 22

Slide 22

Slide 22 text

Multiple-callback hell Rabbit recommends its driver be used in push mode: val channel: Channel channel.basicConsume("queue", autoAck = false, "consumer tag", new DefaultConsumer(channel) { def handleDelivery(body: Array[Byte]) Unit = ??? def handleShutdown() Unit = ??? def handleCancel() Unit = ??? } ) Itamar Ravid - @iravid_ 23

Slide 23

Slide 23 text

Multiple-callback hell Stuff our entire logic into the callback? channel.basicConsume("queue", autoAck = false, "consumer tag", new DefaultConsumer(channel) { def handleDelivery(body: Array[Byte]) Unit = deserialize(body) match { } } ) Itamar Ravid - @iravid_ 24

Slide 24

Slide 24 text

Multiple-callback hell Enqueue somewhere? channel.basicConsume("queue", autoAck = false, "consumer tag", new DefaultConsumer(channel) { def handleDelivery(body: Array[Byte]) Unit = queue.offer(body) } ) Itamar Ravid - @iravid_ 25

Slide 25

Slide 25 text

Escaping multiple callback hell The ZStream solution for this is ZStream.effectAsync: ZStream.effectAsync[Any, Throwable, Array[Byte]] { cb channel.basicConsume( new DefaultConsumer(channel) { def handleDelivery(body: Array[Byte]) Unit = cb(ZIO.succeed(body)) def handleShutdown() Unit = cb(ZIO.fail(None)) } ) } Itamar Ravid - @iravid_ 26

Slide 26

Slide 26 text

Escaping multiple callback hell From that call, we get a plain old: ZStream[Any, Throwable, Array[Byte]] Which we can freely compose as usual. No more callback contortions! Itamar Ravid - @iravid_ 27

Slide 27

Slide 27 text

Escaping multiple callback hell effectAsyncInterrupt can specify how to cancel the subscription: ZStream.effectAsyncInterrupt[Any, Throwable, Array[Byte]] { cb val consumerTag = channel.basicConsume(new DefaultConsumer ) Left(UIO(channel.basicCancel(consumerTag))) } Itamar Ravid - @iravid_ 28

Slide 28

Slide 28 text

Adaptive batching Databases love batches. And with streams, we can easily batch things up for consumption in a database: val dataStream: Stream[Throwable, Record] dataStream .groupedWithin(2000, 30.seconds) Stream[Throwable, List[Record]] .mapM(insertRows) Itamar Ravid - @iravid_ 29

Slide 29

Slide 29 text

Adaptive batching Databases love batches. And with streams, we can easily batch things up for consumption in a database: val dataStream: Stream[Throwable, Record] dataStream .aggregateAsyncWithin( ZTransducer.collectAllN(2000), Schedule.f xed(30.seconds) ) .mapM(insertRows) Itamar Ravid - @iravid_ 30

Slide 30

Slide 30 text

Adaptive batching This approach favours throughput over latency. What if we want to balance the two? Itamar Ravid - @iravid_ 31

Slide 31

Slide 31 text

Adaptive batching We can ship things off as long as the path is clear: val dataStream: Stream[Throwable, Record] dataStream .aggregateAsync(ZTransducer.collectAllN(2000)) .mapM(insertRows) Itamar Ravid - @iravid_ 32

Slide 32

Slide 32 text

Adaptive batching Thanks to Schedule, we can construct a sophisticated batch schedule: val schedule: Schedule[Clock, List[Record], Unit] = Start off with 30 second timeouts as long as batch size is < 1000 ZSchedule.f xed(30.seconds).whileInput(_.size < 1000) andThen and then, switch to a shorter, jittered schedule ZSchedule.f xed(5.seconds).jittered for as long as batches remain over 1000 .whileInput(_.size 1000) Itamar Ravid - @iravid_ 33

Slide 33

Slide 33 text

Adaptive batching Thanks to Schedule, we can construct a sophisticated batch schedule: val schedule: Schedule[Clock, List[Record], Unit] = ZSchedule.f xed(30.seconds).whileInput(_.size < 1000) andThen ZSchedule.f xed(5.seconds).jittered.whileInput(_.size 1000) dataStream .aggregateAsyncWithin(collectAllN(2000), schedule) .mapM(insertRows) Itamar Ravid - @iravid_ 34

Slide 34

Slide 34 text

Application composition Long-running applications are often composed of multiple components: val kafkaStream: ZStream[Blocking, Throwable, Record] val httpServer: Task[Nothing] val scheduledJobRunner: Task[Nothing] Itamar Ravid - @iravid_ 35

Slide 35

Slide 35 text

Application composition What should our main fiber do with all of these? • Launch and wait so our app doesn't exit prematurely • Interrupt everything when SIGTERM is received • Watch all the fibers and quickly exit if something goes wrong Itamar Ravid - @iravid_ 36

Slide 36

Slide 36 text

Application composition Here's an approach that doesn't work: val main = kafkaConsumer.runDrain.fork httpServer.fork scheduledJobRunner.fork ZIO.never Itamar Ravid - @iravid_ 37

Slide 37

Slide 37 text

Application composition So let's watch the fibers: val managedApp = for { kafka kafkaConsumer.runDrain.forkManaged http httpServer.forkManaged jobs scheduledJobRunner.forkManaged } yield ZIO.raceAll(kafka.await, List(http.await, jobs.await)) val main = managedApp .use(identity) . atMap(ZIO.done(_)) Itamar Ravid - @iravid_ 38

Slide 38

Slide 38 text

Itamar Ravid - @iravid_ 39

Slide 39

Slide 39 text

Application composition val managedApp = for { _ other resources _ ZStream.mergeAllUnbounded( kafkaConsumer.drain, ZStream.fromEffect(httpServer), ZStream.fromEffect(scheduledJobRunner) ).runDrain.toManaged_ } yield () val main = managedApp.use_(ZIO.unit).exitCode Itamar Ravid - @iravid_ 40

Slide 40

Slide 40 text

Want to learn more? Stream Processing with Scala 15-18.6, remote only https://bit.ly/2AaEMeB Itamar Ravid - @iravid_ 41

Slide 41

Slide 41 text

Thank you! Itamar Ravid - @iravid_ 42