Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Stream Driven Development - Design your data pipeline with Akka Streams

Stream Driven Development - Design your data pipeline with Akka Streams

Akka Streams is a well-known Reactive Streams implementation that helps you build asynchronous, data-intensive applications with no predetermined data volumes. But how would you leverage its full power to design complex, Akka-backed reactive pipelines? At HomeAway.com we devised an approach to tackle this problem, combining elements of Domain-Driven Design with the abstraction power of the Akka Streams model. In this talk we’ll present our approach by example, discussing useful patterns to

- reason about your streaming application and identify its building blocks
- type-drive the implementation
- handle failure
- instrument your application for logging and monitoring purposes

Stefano Bonetti

October 19, 2017
Tweet

More Decks by Stefano Bonetti

Other Decks in Programming

Transcript

  1. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. OBJECTIVE Quickly roll out

    pipelines which are 13 RESILIENT SCALABLE
  2. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. OBJECTIVE Quickly roll out

    pipelines which are 14 RESILIENT SCALABLE REACTIVE
  3. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. OBJECTIVE Quickly roll out

    pipelines which are 15 RESILIENT SCALABLE streams
  4. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 21 SOURCE (1 output)

    FAN-IN (n inputs, 1 output) FAN-OUT (1 input, n outputs) RUNNABLEGRAPH (no input or output) STREAMS STAGES FLOW (1 input, 1 output) SINK (1 input) ... CUSTOM
  5. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 22 Source ~> Flow

    = Source Flow ~> Sink = Sink Flow ~> Flow = Flow Broadcast ~> Merge = Flow Source ~> Sink = RunnableGraph Source ~> Flow ~> Sink = RunnableGraph STAGE ARITHMETIC
  6. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 34 Flow[PropertyChange, AdwordsChange, NotUsed]

    RunnableGraph Source[PropertyChange, NotUsed] Sink[AdwordsChange, Future[Done]]
  7. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 35 Flow[PropertyChange, AdwordsChange, NotUsed]

    RunnableGraph[Future[Done]] Source[PropertyChange, NotUsed] Sink[AdwordsChange, Future[Done]]
  8. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 36 Flow[PropertyChange, AdwordsChange, NotUsed]

    RunnableGraph[Future[Done]] Source[PropertyChange, NotUsed] Sink[AdwordsChange, Future[Done]]
  9. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 38 SIDE EFFECTS Error

    Handling Logging Monitoring EVENTS Errors Audit ... ...
  10. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 39 SIDE EFFECTS Error

    Handling Logging Monitoring EVENTS Errors Audit ... ... REFERENTIALLY TRANSPARENT STAGES Flow[In, Either[Error, Out], Mat] Flow[In, (Seq[Audit], Out), Mat] ...
  11. def propertySource(config: KafkaConfig): Source[PropertyChange, NotUsed] = { def settings(config: KafkaConfig):

    ConsumerSettings[String, PropertyChange] = ??? val kafkaSrc: Source[ConsumerRecord[String, PropertyChange], Control] = Consumer.plainSource( settings(config), Subscriptions.topics(config.topic) ) kafkaSrc .map(_.value) .mapMaterializedValue { _ ⇒ NotUsed } } SOURCE
  12. def propertySource(config: KafkaConfig): Source[PropertyChange, NotUsed] = { def settings(config: KafkaConfig):

    ConsumerSettings[String, PropertyChange] = ??? val kafkaSrc: Source[ConsumerRecord[String, PropertyChange], Control] = Consumer.plainSource( settings(config), Subscriptions.topics(config.topic) ) kafkaSrc .map(_.value) .mapMaterializedValue { _ ⇒ NotUsed } } SOURCE - REACTIVE KAFKA
  13. def propertySource(config: KafkaConfig): Source[PropertyChange, NotUsed] = { def settings(config: KafkaConfig):

    ConsumerSettings[String, PropertyChange] = ??? val kafkaSrc: Source[ConsumerRecord[String, PropertyChange], Control] = Consumer.plainSource( settings(config), Subscriptions.topics(config.topic) ) kafkaSrc .map(_.value) .mapMaterializedValue { _ ⇒ NotUsed } } SOURCE - REACTIVE KAFKA
  14. def processingFlow(): Flow[PropertyChange, Either[ValidationError, AdwordsChange], NotUsed] = { val service:

    PropertyProcessingService = ??? Flow.fromFunction(service.process) } PROCESSING FLOW
  15. def processingFlow(): Flow[PropertyChange, Either[ValidationError, AdwordsChange], NotUsed] = { val validationService

    : PropertyValidationService = ??? val transformationService: PropertyTransformationService = ??? Flow.fromFunction(validationService.validate) .map(transformationService.transform) } PROCESSING FLOW
  16. def processingFlow(): Flow[PropertyChange, Either[ValidationError, AdwordsChange], NotUsed] = { val validationService

    : PropertyValidationService = ??? val transformationService: PropertyTransformationService = ??? Flow.fromFunction(validationService.validate) .async .map(transformationService.transform) } PROCESSING FLOW
  17. STORING FLOW def adwordsFlow(config: AdwordsConfig): Flow[AdwordsChange, Either[StorageError, Stored], NotUsed] =

    { val service: AdwordsService = ??? Flow[AdwordsChange].mapAsync(config.parallelism)(service.store) }
  18. ERROR SINK def errorSink(cfg: ErrorConfig): Sink[AdwordsStreamError, NotUsed] = { val

    service: AdwordsErrorService = ??? Sink.foreach[AdwordsStreamError](service.handle) .mapMaterializedValue(_ ⇒ NotUsed) }
  19. def graph(source : Source[PropertyChange, NotUsed], process : Flow[PropertyChange, Either[ValidationError, AdwordsChange],

    NotUsed], store : Flow[AdwordsChange, Either[StorageError, Stored], NotUsed], errorSink: Sink[AdwordsStreamError, NotUsed]): RunnableGraph[Future[Done]] = { val processAndDivert: Flow[PropertyChange, AdwordsChange, NotUsed] = process via DivertErrors(to = errorSink) val storeAndDivert: Flow[AdwordsChange, Stored, NotUsed] = store via DivertErrors(to = errorSink) source .via(processAndDivert) .via(storeAndDivert) .toMat(Sink.ignore)(Keep.right) } GRAPH
  20. def graph(source : Source[PropertyChange, NotUsed], process : Flow[PropertyChange, Either[ValidationError, AdwordsChange],

    NotUsed], store : Flow[AdwordsChange, Either[StorageError, Stored], NotUsed], errorSink: Sink[AdwordsStreamError, NotUsed]): RunnableGraph[Future[Done]] = { val processAndDivert: Flow[PropertyChange, AdwordsChange, NotUsed] = process via DivertErrors(to = errorSink) val storeAndDivert: Flow[AdwordsChange, Stored, NotUsed] = store via DivertErrors(to = errorSink) source .via(processAndDivert) .via(storeAndDivert) .toMat(Sink.ignore)(Keep.right) } GRAPH
  21. def graph(source : Source[PropertyChange, NotUsed], process : Flow[PropertyChange, Either[ValidationError, AdwordsChange],

    NotUsed], store : Flow[AdwordsChange, Either[StorageError, Stored], NotUsed], errorSink: Sink[AdwordsStreamError, NotUsed]): RunnableGraph[Future[Done]] = { val processAndDivert: Flow[PropertyChange, AdwordsChange, NotUsed] = process via DivertErrors(to = errorSink) val storeAndDivert: Flow[AdwordsChange, Stored, NotUsed] = store via DivertErrors(to = errorSink) source .via(processAndDivert) .via(storeAndDivert) .toMat(Sink.ignore)(Keep.right) } GRAPH
  22. def graph(source : Source[PropertyChange, NotUsed], process : Flow[PropertyChange, Either[ValidationError, AdwordsChange],

    NotUsed], store : Flow[AdwordsChange, Either[StorageError, Stored], NotUsed], errorSink: Sink[AdwordsStreamError, NotUsed]): RunnableGraph[Future[Done]] = { val processAndDivert: Flow[PropertyChange, AdwordsChange, NotUsed] = process via DivertErrors(to = errorSink) val storeAndDivert: Flow[AdwordsChange, Stored, NotUsed] = store via DivertErrors(to = errorSink) source .via(processAndDivert) .via(storeAndDivert) .toMat(Sink.ignore)(Keep.right) } GRAPH
  23. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 60 Flow[Either[E,T], T, M]

    Sink[E, M] Either[E, T] T E DIVERT ERRORS
  24. def divertErrors[T, E, M](to: Sink[E, M]): Flow[Either[E, T], T, M]

    = { Flow.fromGraph(GraphDSL.create(to) { implicit b ⇒ sink ⇒ val partition = b.add(Partition[Either[E, T]](2, _.fold(_ ⇒ 0, _ ⇒ 1))) val left = b.add(Flow[Either[E, T]].map (_.left.get)) val right = b.add(Flow[Either[E, T]].map (_.right.get)) partition ~> left ~> sink partition ~> right FlowShape(partition.in, right.out) }) } GRAPH - DIVERT ERRORS
  25. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 62 PropertyProcessingService def process(p:

    PropertyChange): Either[ValidationError, AdwordsChange] PropertyChange AdwordsService def store(p: AdwordsChange): Future[Either[StorageError, Stored]] DOMAIN ValidationError AdwordsChange Stored StorageError AdwordsErrorService def handle(p: AdwordsError): Unit
  26. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 63 APPLICATION Failure Handling

    Config Management Materialization Stage Creation
  27. val cfg: ApplicationConfig = ??? val source = propertySource(cfg.kafka) val

    pFlow = processingFlow() val sFlow = adwordsFlow(cfg.adwords) val eSink = errorSink(cfg.error) val graph = AdwordsGraph.graph(source, pFlow, sFlow, eSink) graph.run() APPLICATION
  28. kafka { bootstrapServers = "my.kafka" consumerGroup = "my-consumer" topic =

    "properties" } adwords { parallelism = 5 endpoint = "http://adwords.google.com/properties" } error { loggerName = "adwordsErrors" } APPLICATION - CONFIGURATION
  29. © 2017 HOMEAWAY. ALL RIGHTS RESERVED. 67 Application layer Graph

    Events Services Domain layer Repositories Factories Sources Flows Sinks Other Reactive layer