Slide 1

Slide 1 text

1 Building an Streaming Platform with Kafka Pere Urbón-Bayes Technical Architect (TAM) [email protected]

Slide 2

Slide 2 text

2 Topics • Set the stage. • Introducing the key concepts ( Kafka Broker, Connect and KStreams) • Using events for notifications and state transfer • Conclusion

Slide 3

Slide 3 text

3 Kafka & Confluent

Slide 4

Slide 4 text

4 Is Kafka a Streaming Platform? The Log Connectors Connectors Producer Consumer Streaming Engine

Slide 5

Slide 5 text

5 authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 6

Slide 6 text

6 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; What exactly is Stream Processing? authorization_attempts possible_fraud

Slide 7

Slide 7 text

7 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 8

Slide 8 text

8 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 9

Slide 9 text

9 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 10

Slide 10 text

10 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 11

Slide 11 text

11 CREATE STREAM possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; authorization_attempts possible_fraud What exactly is Stream Processing?

Slide 12

Slide 12 text

12 Streaming is the toolset for dealing with events as they move!

Slide 13

Slide 13 text

13 Looking more closely: What is a Streaming Platform? The Log Connectors Connectors Producer Consumer Streaming Engine

Slide 14

Slide 14 text

14 Looking more closely: Kafka’s Distributed Log The Log Connectors Connectors Producer Consumer Streaming Engine

Slide 15

Slide 15 text

15 Kafka’s Distributed Log: A durable messaging system Kafka is similar to a traditional messaging system (ActiveMQ, Rabbit,..) but with: • Better scalability • Fault Tolerance • Hight Availability • Better storage.

Slide 16

Slide 16 text

16 The log is a simple idea Messages are always appended at the end Old New

Slide 17

Slide 17 text

17 Consumers have a position all of their own Sally is here George is here Fred is here Old New Scan Scan Scan

Slide 18

Slide 18 text

18 Only Sequential Access Old New Read to offset & scan

Slide 19

Slide 19 text

19 Scaling Out

Slide 20

Slide 20 text

20 Shard data to get scalability Messages are sent to different partitions Producer (1) Producer (2) Producer (3) Cluster of machine s Partitions live on different machines

Slide 21

Slide 21 text

21 Replicate to get fault tolerance replicate msg msg leader Machine A Machine B

Slide 22

Slide 22 text

22 Replication provides resiliency A ‘replica’ takes over on machine failure

Slide 23

Slide 23 text

23 Linearly Scalable Architecture Single topic: - Many producers machines - Many consumer machines - Many Broker machines No Bottleneck!! Consumers Producers KAFKA

Slide 24

Slide 24 text

24 Clusters can be connected to provide Worldwide, localized views 24 NY London Tokyo Replicator Replicator Replicator

Slide 25

Slide 25 text

25 The Connect API The Log Connectors Connectors Producer Consumer Streaming Engine

Slide 26

Slide 26 text

26 Ingest / Egest into practically any data source Kafka Connect Kafka Connect Kafka

Slide 27

Slide 27 text

27 List of Kafka Connect sources and sinks (and more…) Amazon S3 Elasticsearch HDFS JDBC Couchbase Cassandra Oracle SAP Vertica Blockchain JMX Kenesis MongoDB MQTT NATS Postgres Rabbit Redis Twitter DynamoDB FTP Github BigQuery Google Pub Sub RethinkDB Salesforce Solr Splunk

Slide 28

Slide 28 text

28 The Kafka Streams API / KSQL The Log Connectors Connectors Producer Consumer Streaming Engine

Slide 29

Slide 29 text

29 SELECT card_number, count(*) FROM authorization_attempts WINDOW (SIZE 5 MINUTE) GROUP BY card_number HAVING count(*) > 3; Engine for Continuous Computation

Slide 30

Slide 30 text

30 But it’s just an API public static void main(String[] args) { StreamsBuilder builder = new StreamsBuilder(); builder.stream(”caterpillars") .map((k, v) -> coolTransformation(k, v)) .to(“butterflies”); new KafkaStreams(builder.build(), props()).start(); } 30

Slide 31

Slide 31 text

31 Compacted Topic Join Stream Table Kafka Kafka Streams / KSQL Topic Join Streams and Tables

Slide 32

Slide 32 text

32 Windows / Retention – Handle Late Events The asynchronous dilemma: Who was first? The order or the payment? KAFKA Payments Orders Buffer 5 mins Emailer Join by Key

Slide 33

Slide 33 text

33 KAFKA Payments Orders Buffer 5 mins Emailer Join by Key KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)) .peek((key, pair) -> emailer.sendMail(pair)); Windows / Retention – Handle Late Events

Slide 34

Slide 34 text

34 A KTable is just a stream with infinite retention KAFKA Emailer Orders, Payments Customers Join

Slide 35

Slide 35 text

35 KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”); KTable customers = builder.table(“Customers”); orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)) .join(customers, (tuple, cust) -> tuple.setCust(cust)) .peek((key, tuple) -> emailer.sendMail(tuple)); KAFKA Emailer Orders, Payments Customers Join Materialize a table in two lines of code! A KTable is just a stream with infinite retention

Slide 36

Slide 36 text

36 The Log Connectors Connectors Producer Consumer Streaming Engine Kafka is a complete Streaming Platform

Slide 37

Slide 37 text

37 What happens when we apply this to Microservices? Microservices

Slide 38

Slide 38 text

38 Microservices App Increasingly we build ecosystems: Microservices

Slide 39

Slide 39 text

39 We break them into services that have specific roles Customer Service Shipping Service

Slide 40

Slide 40 text

40 The Problem is now your DATA

Slide 41

Slide 41 text

41 Most services share the same core facts. Orders Customers Catalog Most services live in here

Slide 42

Slide 42 text

42 Kafka works as a Backbone for Services to exchange Events 42 Kafka Notification Data is replicated

Slide 43

Slide 43 text

43 Services on a Streaming Platform

Slide 44

Slide 44 text

44

Slide 45

Slide 45 text

45 Thank You!, questions? Pere Urbón-Bayes Technical Architect (TAM) [email protected] http://www.twitter.com/purbon