Slide 1

Slide 1 text

Reactive Streams control flow, back-pressure, akka-streams Rhein-Main Scala Enthusiasts

Slide 2

Slide 2 text

• Working at dataWerks • 10 years with JVM, 3 years with Scala • Focusing on distributed systems • Did online courses for learning Java language About me - Alexey Novakov

Slide 3

Slide 3 text

What is Reactive Stream? • It is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. (JVM & JavaScript) • Started by Lightbend, Pivotal, Netflix and others http://www.reactive-streams.org

Slide 4

Slide 4 text

JVM Interfaces "org.reactivestreams" % "reactive-streams" % "1.0.1” "org.reactivestreams" % "reactive-streams-tck" % "1.0.1" % "test" at Maven Central: API + Technology Compatibility Kit Now is in JDK 9 as java.util.concurrent.Flow It is a copy of RS API

Slide 5

Slide 5 text

Content • Keywords: ➔ publisher, subscriber, processor, subscription ➔ data stream processing ➔ synchronous / asynchronous ➔ back-pressure

Slide 6

Slide 6 text

Reactive Manifesto http://www.reactivemanifesto.org … Reactive Streams are also related to Reactive Manifesto

Slide 7

Slide 7 text

Stream parts Publisher Processor Subscriber Message Queue Http con. Database con. File etc. some data transformation function Console TCP etc. subscription subscription ordered or unordered

Slide 8

Slide 8 text

Typical Scenarios
 with unbounded data
 processing

Slide 9

Slide 9 text

Publisher & Subscriber Source Sink Data constantly is moving from Source to Sink Each flow stage can be sync or async Stage 1 Stage 2 Stage 3 Stage 4 Flow

Slide 10

Slide 10 text

Publisher & Subscriber Source Sink Well, I still have unfinished work I have 10K messages for you Time later: OutOfMemory R.I.P

Slide 11

Slide 11 text

Problem situations 1) Slow Publisher, Fast Subscriber 2) Fast Publisher, Slow Subscriber Publisher also has to deal with its own back-pressure. I am busy, wait… Not so fast, I am working … I always have something for you

Slide 12

Slide 12 text

Stream w/ back-pressure Source Sink I have 10K messages for you Sure, just let me know when you are ready Ok, give me next 30 messages No buffer overflow anymore Subscriber signals about its demand Could you slow down? I have no space for those message

Slide 13

Slide 13 text

Publisher: void subscribe(subscriber) Interfaces Subscriber: onSubscribe(s) onNext*(e) onError(t) | onComplete ? Subscription: request(n) cancel

Slide 14

Slide 14 text

Back-pressure • Subscriber tells number messages it can process • Publisher sends that requested amount • It is simple protocol to enable dynamic “push-pull” communication ➢ Propagated through the entire stream (Source -> Sink) ➢ Enables bounded queue/buffer request(n) onNext(m)

Slide 15

Slide 15 text

Implementation • Akka Streams • MongoDB • Ratpack • Reactive Rabbit • Reactor • RxJava • Slick • Vert.x 3.0 • Monix

Slide 16

Slide 16 text

• appeared around 2014 • uses Actors behind the scene • provides Scala and Java DSL • driven by Lightbend Akka Team • simplifies usage Actors in some sense Streams Implementation

Slide 17

Slide 17 text

Example 1 A .. Z A + B + C + D… print(ABCD…) Source Flow Sink (Publisher) (Processor) (Subscriber)

Slide 18

Slide 18 text

Example 1 Output: ABCDEFGHIJKLMNOPQRSTUVWXYZ implicit val system = ActorSystem(”Example1")
 implicit val materializer = ActorMaterializer() val source = Source('A' to 'Z')
 val fold = Flow[Char].fold(" ")(_ + _) val sink = Sink.foreach[String](println) source.via(fold).to(sink).run These guys need to be around Stream Parts Bind and execute in a separate thread

Slide 19

Slide 19 text

• Sometimes you need to run short-term stream and get a side-value as its result • It can be some metric • Or last element of the executed stream, etc. • Akka-Streams calls this process - Materialization Get a value back

Slide 20

Slide 20 text

Example of mat value val source = Source(1 to 100)
 val concat = Flow[Int].filter(_ % 2 == 0) 
 val sink = Sink.fold[Int, Int](0)(_ + _)
 
 val g: RunnableGraph[Future[Int]] = source.via(concat).toMat(sink)(Keep.right) 
 val sum: Future[Int] = g.run
 sum.foreach(print) Output: 2550

Slide 21

Slide 21 text

Akka-Streams Source, Flow, Sink RunnableGraph (blueprint) Materialized values A graph may produce some value as the result of stream execution Runtime Starting Actors, Opening Sockets, Providing other resources Order of streaming elements is preserved API reminds Scala Collections std library

Slide 22

Slide 22 text

Stream Materialization • By default processing stages are fused: - only one Actor will be used - single-threaded processing Source(List (1, 2, 3)) .map(_ + 1) .map(_ * 2)
 .to(Sink.ignore)

Slide 23

Slide 23 text

Stream Materialization • Use “async” combinator to run on multiple actors Source(List (1, 2, 3))
 .map(_ + 1).async .map(_ * 2)
 .to(Sink.ignore) Async boundaries

Slide 24

Slide 24 text

Example 2 – Stock Exchange Stream Order Id generator Order Gateway Order Logger Executor Order Processor Persistence MySQL Orders Source [Order] Sink [PartialFills] Flow [Order, PreparedOrder] Flow [PreparedOrder, LoggedOrder] Flow [LoggedOrder, ExecuteOrder] Flow [ExecuteOrder, PartialsFills MySQL Execution

Slide 25

Slide 25 text

val orderPublisher = ActorPublisher[Order](orderGateway) Source.fromPublisher(orderPublisher)
 .via(OrderIdGenerator())
 .via(OrderPersistence(orderDao))
 .via(OrderProcessor())
 .via(OrderExecutor())
 .runWith(Sink.actorSubscriber(orderLogger)) // testing: send some orders to publisher actor 1 to 1000 foreach { _ => orderGateway ! generateRandomOrder } Is an ActorRef It is not aware about back-pressure

Slide 26

Slide 26 text

object OrderIdGenerator {
 def apply(): Flow[Order, PreparedOrder, NotUsed] = {
 var seqNo: Long = 0
 
 def nextSeqNo(): Long = {
 seqNo += 1
 seqNo
 }
 
 Flow.fromFunction(o => PreparedOrder(o, nextSeqNo()))
 }
 }

Slide 27

Slide 27 text

Example 2 alt. Fan Out Order Id Generator Order Gateway Order Logger Executor Order Processor Persistence MySQL Orders MySQL Execution Order Logger Order Logger Order Logger Order Logger Order Logger Broadcast Load Balance

Slide 28

Slide 28 text

Example 2 alt.: Graph DSL val bcast = b.add(Broadcast[PreparedOrder](2))
 val balancer = b.add(Balance[PartialFills](workers))
 
 val S = b.add(Source.fromGraph(orderSource))
 val IdGen = b.add(OrderIdGenerator())
 val A = b.add(OrderPersistence(orderDao).to(Sink.ignore))
 val B = b.add(OrderProcessor2())
 val C = b.add(OrderExecutor())
 
 S ~> IdGen ~> bcast
 bcast ~> A
 bcast ~> B ~> C ~> balancer
 
 for (i <- 0 until workers)
 balancer ~> b.add(Sink.fromGraph(orderLogger).named(s"logger-$i"))

Slide 29

Slide 29 text

Building Blocks Inlet Outlet

Slide 30

Slide 30 text

/** * A bidirectional flow of elements * that consequently has two inputs and two * outputs, arranged like this: * * {{{ * +------+ * In1 ~>| |~> Out1 * | bidi | * Out2 <~| |<~ In2 * +------+ * }}} */

Slide 31

Slide 31 text

Nesting

Slide 32

Slide 32 text

Cycling Graph

Slide 33

Slide 33 text

• Let’s implement a WordCount over the infinite Twitter Stream • We can use free API: Filter Real-time Tweets - https://stream.twitter.com/1.1/statuses/filter.json - HTTP chunked response • Just register your Twitter app to get a consumer key Example 3 – Twitter Stream

Slide 34

Slide 34 text

Example 3 – Twitter Stream scan[ByteString] filter[String] map[Tweet] scan[String] forEach[String]

Slide 35

Slide 35 text

val response = Http().singleRequest(httpRequest)
 
 response.foreach { resp =>
 resp.status match {
 case OK =>
 val source: Source[ByteString, Any] = resp.entity.withoutSizeLimit().dataBytes ... } } 
 Hi, akka-http :-)

Slide 36

Slide 36 text

source
 .scan("")((acc, curr) =>
 if (acc.contains("\r\n")) curr.utf8String
 else acc + curr.utf8String
 ) .filter(_.contains("\r\n")).async Reseting accumulator here

Slide 37

Slide 37 text

scanLeft https://superruzafa.github.io/visual-scala-reference/index.html#scanLeft

Slide 38

Slide 38 text

.scan(Map.empty[String, Int]) {
 (acc, text) => {
 val wc = tweetWordCount(text)
 ListMap( (acc combine wc).toSeq .sortBy(- _._2) .take(uniqueBuckets): _* )
 }
 } Starting from this stage, flow is concurrent

Slide 39

Slide 39 text

def tweetWordCount(text: String): Map[String, Int] = {
 text.split(" ")
 .filter(s => s.trim.nonEmpty && s.matches("\\w+"))
 .map(_.trim.toLowerCase)
 .filterNot(stopWords.contains)
 .foldLeft(Map.empty[String, Int]) {
 (count, word) => count |+| Map(word -> 1) }
 }

Slide 40

Slide 40 text

.runForeach { wc =>
 val stats = wc.take(topCount) .map{case (k, v) => k + ":" + v}.mkString(" ") 
 print("\r" + stats)
 }

Slide 41

Slide 41 text

Project Alpakka Source/Flow/Sink implementation for many popular data sources • AMQP Connector • Apache Geode connector • AWS DynamoDB Connector • AWS Kinesis Connector • AWS Lambda Connector • AWS S3 Connector • AWS SNS Connector • AWS SQS Connector • Azure Storage Queue Connector • Cassandra Connector • Elasticsearch Connector • File Connectors • FTP Connector • Google Cloud Pub/Sub • HBase connector • IronMq Connector • JMS Connector • MongoDB Connector • MQTT Connector • Server-sent Events (SSE) Connector • Slick (JDBC) Connector • Spring Web • File IO • Azure • Camel • Eventuate • FS2 • HTTP Client • MongoDB • Kafka • TCP

Slide 42

Slide 42 text

Konrad Molawski at JavaOne 2017 https://www.youtube.com/watch?v=KbZ-psFJ-fQ

Slide 43

Slide 43 text

Going to Production • Configure your ExecutionContext • Set Supervision strategy to react on failures • Think/test which stage can be fused and which can be done concurrently • Think on using grouping of the elements for better throughput • Set Overflow strategy • Think on rate limiter using throttle combinator

Slide 44

Slide 44 text

Thank you! Questions? More to learn: • https://doc.akka.io/docs/akka/2.5.6/scala/stream/ Official documentation • https://github.com/reactive-streams/reactive-streams-jvm Reactive Streams specification • https://blog.redelastic.com/diving-into-akka-streams-2770b3aeabb0 Kevin Webber, Diving into Akka Streams • http://blog.colinbreck.com/patterns-for-streaming-measurement-data-with-akka-streams/ Colin Breck: Patterns for Streaming Measurement Data with Akka Streams • https://github.com/novakov-alexey/meetup-akka-streams Examples Source Code