Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Jaakko Pallari (@lepovirta) Simon Souter (@simonsouter) Staging Reactive data pipelines using Kafka as the backbone /cakesolutions /scala-kafka-client

Slide 3

Slide 3 text

MANCHESTER LONDON NEW YORK Reactive Solutions at Cake

Slide 4

Slide 4 text

Contents 1. Reactive Data Pipelines 2. Kafka as a Reactive Message Queue 3. Architecture & Consumer Patterns 4. Streaming Application Development

Slide 5

Slide 5 text

Stream Processing ● Big Data ● Processing in Real-time ● Event Throughput vs Number of Queries ● IoT Source Service Sink

Slide 6

Slide 6 text

Distributed Streaming Engines ● Server Applications ● Stream topologies deployed to cluster ● Framework design

Slide 7

Slide 7 text

Streaming from ground-up ● Custom Streaming Applications ● Leverage existing tool stack Source Application Sink

Slide 8

Slide 8 text

Staged data pipelines ● Staged Event Driven Architecture ● Processes separated by a queue ● Processing in stages Process Queue Process Queue Queue

Slide 9

Slide 9 text

Reactive data pipelines ● Responsive ● Resilient ● Elastic ● Message Driven Process Queue Process Source Sink

Slide 10

Slide 10 text

Streaming from ground-up ● Microservices as processing components Source Microservice 1 Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink Queue

Slide 11

Slide 11 text

● Deployment via cluster orchestration services Streaming from ground-up Source Microservice 1 Queue Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink Orchestration Service Scale up

Slide 12

Slide 12 text

Streaming from ground-up ● Messaging middleware for resilient data distribution between microservices Source Microservice 1 Queue Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink

Slide 13

Slide 13 text

What is Kafka? ● Distributed Message Broker ● Supports Parallel Streaming ● Kafka as a Reactive MQ Source Microservice 1 Kafka Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink

Slide 14

Slide 14 text

Kafka Topic: “Electric_Readings” Kafka: topic and message anatomy Key: “meter1” Value: 1.34 Electric Bill Calculation Auditing Message Driven

Slide 15

Slide 15 text

Kafka: at-least-once delivery Kafka Topic: “Electric_Readings” Electric meter Consumption Aggregator Deliver ACK Deliver ACK Resilient

Slide 16

Slide 16 text

Kafka node 2 Kafka node 1 Kafka: clustering - arrangement Kafka Topic Partition 1 Partition 2 Elastic

Slide 17

Slide 17 text

Kafka: clustering - replication Resilient Kafka node 2 Kafka node 1 Kafka Topic Partition 1 Partition 2 Partition 2 Replica Partition 1 Replica

Slide 18

Slide 18 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Consumer #3 Kafka Topic Responsive Same consumer group

Slide 19

Slide 19 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Consumer #3 Kafka Topic Responsive

Slide 20

Slide 20 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Consumer #3 Kafka Topic Responsive

Slide 21

Slide 21 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Consumer #3 Kafka Topic Responsive

Slide 22

Slide 22 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Consumer #3 Kafka Topic Responsive

Slide 23

Slide 23 text

Kafka: clustering - consumer Partition #1 Partition #2 Partition #3 Consumer #1 Consumer #2 Kafka Topic Responsive Consumer #3 Consumer #4 No Data

Slide 24

Slide 24 text

Kafka: high throughput ● Single partition consumer: 20-90 Mb/sec Responsive

Slide 25

Slide 25 text

Kafka the Reactive MQ Message Driven ● Key-value messages Responsive ● Consumer clustering ● High throughput Resilient ● At-least-once delivery ● Replication Elastic ● Linear scalability

Slide 26

Slide 26 text

Kafka consumer patterns Source Microservice 1 Kafka Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink

Slide 27

Slide 27 text

Simple message queue Partition Electric Meter Auditing Electric Readings Partition replica Partition replica Kafka Terminology: - Partition Count: 1

Slide 28

Slide 28 text

Simple message queue - fanout Partition Electric Meter Auditing Electric Readings Partition replica Partition replica Billing Kafka Terminology: - Partition Count: 1 - Multiple Consumer Groups

Slide 29

Slide 29 text

DB Simple message queue - consumer Auditing Service Consumer Client App logic Kafka Partition 1. Consume a batch of messages from Kafka 2. Process messages and send results to wherever necessary (e.g. another Kafka topic) 3. Confirm delivery to Kafka Kafka Terminology: - Commit Mode: Manual

Slide 30

Slide 30 text

Partition Kafka: message confirmation ● Messages confirmed by offset (not individually) Commit point Consumer Consumed: Kafka Terminology: - Commit Mode: Manual

Slide 31

Slide 31 text

Partition Kafka: message confirmation ● Messages confirmed by offset (not individually) Commit point Consumer Commit Consumed: Kafka Terminology: - Commit Mode: Manual

Slide 32

Slide 32 text

Parallel workers Partition #1 Partition #2 Partition #N Electric Meter Auditing node #1 Auditing node #2 Auditing node #N Electric Readings Electric Meter Electric Meter Kafka Terminology: - Partition Count: >1 - Single Consumer Group

Slide 33

Slide 33 text

Kafka Partition Kafka Partition Consumer for parallel processing DB Auditing Service Consumer Client App logic Kafka Partition ● Same arrangement from consumer perspective Kafka Terminology: - Partition Count: >1 - Commit Mode: Manual

Slide 34

Slide 34 text

Orchestration ● Provide Scaling Capability ● Restart or replace failed nodes Partition #1 Partition #2 Partition #N Electric Meter Auditing node #1 Auditing node #2 Auditing node #N Electric Readings Electric Meter Electric Meter Mesos/ Marathon New node

Slide 35

Slide 35 text

Stateful Processing ● Example: Average electricity consumption per meter for the last hour Electric Meter Aggregation Electric Readings Partition Partition Partition Electric Meter Electric Meter

Slide 36

Slide 36 text

Aggregator for Stream and state Partition #1 Partition #2 Aggregator for Electric Readings ● Data locality

Slide 37

Slide 37 text

Aggregator for Stream and state Partition #1 Partition #2 Aggregator for Key: "meter 1" Value: 9.2 Key: "meter 2" Value: 2.7 Electric Readings ● Data locality

Slide 38

Slide 38 text

Aggregator for Fault tolerance Partition #1 Partition #2 Aggregator for Electric Readings ● State persistence and recovery

Slide 39

Slide 39 text

Aggregator for Fault tolerance Partition #1 Partition #2 Aggregator for Electric Readings Persistence ● State persistence and recovery

Slide 40

Slide 40 text

Persistence Stateful Processing app Persistence Kafka Partition Kafka Partition Kafka/DB/? Aggregation Service Consumer Client Aggregation logic Kafka Partition

Slide 41

Slide 41 text

Aggregation Service Consumer Client Aggregation logic Aggregation Service Consumer Client Aggregation logic Stateful Processing app Persistence Kafka Partition Kafka Partition Kafka/DB/? Kafka Partition Duplicated message processing after recovery.

Slide 42

Slide 42 text

Stateful Processing app Persistence Persist state with partition offsets Don't commit! Just fetch more data Kafka Partition Kafka Partition Kafka/DB/? Aggregation Service Consumer Client Aggregation logic Kafka Partition Kafka Terminology: - Commit Mode: Self Managed Offsets

Slide 43

Slide 43 text

Partition #1 Partition #1 Stateful Processing architecture ● Dynamic partition assignment ● Shared Persistence for State Aggregator 2 Aggregator 1 Persistence Kafka/DB/? Partition #4 Partition #6 Partition #1 Partition #2 Orchestration Service Aggregator 3

Slide 44

Slide 44 text

Partition #1 Partition #1 Stateful Processing architecture ● Dynamic partition assignment ● Shared Persistence for State Aggregator 2 Aggregator 1 Persistence Kafka/DB/? Partition #4 Partition #6 Partition #1 Partition #2 Orchestration Service Aggregator 3

Slide 45

Slide 45 text

Streaming Patterns Stateful Processing ● Self-managed processing state Single Partition Topic ● Strong ordering guarantees ● Limited failure recovery ● Scalability is limited Multi Partition Topic ● Parallel processing ● Limited ordering guarantees ● Kafka managed processing state Fanout ● Independent consumer groups

Slide 46

Slide 46 text

Kafka libraries ● Kafka client support in many languages ● Scala, Java, C ● C bindings -> Haskell, OCaml, Python etc. Source Microservice 1 Kafka Microservice 2 Microservice 1 Microservice 1 Microservice 2 Microservice 2 Sink

Slide 47

Slide 47 text

Reactive Streaming APIs ● Similar paradigm as in real-time streaming platforms ● Reactive Kafka ○ Based on Akka Reactive Streams API ○ Scala + Java ○ Developed by Akka team ● Kafka Streams ○ Official streaming API for Kafka ○ Java ○ Developed by Confluent

Slide 48

Slide 48 text

scala-kafka-client ● Kafka client developed for Scala ● Async and non-blocking ● Built on top off the official Java driver ● Easy API with high performance /cakesolutions /scala-kafka-client

Slide 49

Slide 49 text

scala-kafka-client ● Leverage extensive Akka feature set ● Processing logic implemented using Actor Model Kafka Consumer Actor Kafka Producer Actor Receiver Actor Kafka Kafka /cakesolutions /scala-kafka-client

Slide 50

Slide 50 text

Summary ● Leverage Microservice based techniques. ● Streaming topologies can be varied and complex ○ Many use-cases fall under a small set of consumer patterns. ● Challenges around scalable and reactive data pipelines ● Kafka provides first-class support for reactive streaming to your applications. ● Stateful processing remains a challenging area.

Slide 51

Slide 51 text

We didn’t discuss... ● Data serialisation ● Application rolling updates ● Complex streaming topologies

Slide 52

Slide 52 text

Questions? MANCHESTER LONDON NEW YORK /cakesolutions /scala-kafka-client @cakesolutions +44 845 617 1200 [email protected]