Slide 1

Slide 1 text

1 What is Apache Kafka, and What is a Streaming Platform? Budapest Data Forum, 14 Jun 2018 Robin Moffatt @rmoff [email protected] https://speakerdeck.com/rmoff/

Slide 2

Slide 2 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle ACE Director & Dev Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/

Slide 3

Slide 3 text

“ @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Apache Kafka is a Streaming Platform

Slide 4

Slide 4 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018

Slide 5

Slide 5 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Three Lenses

Slide 6

Slide 6 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 01 Messaging Done Right 02 Scalable Streaming 
 Data Pipelines 03 Foundation for 
 Stream Processing What is Apache Kafka?

Slide 7

Slide 7 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Scalability True Storage Real-Time Processing Lens 1: Messaging Done Right

Slide 8

Slide 8 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Lens 2: Scalable Streaming Data Pipelines

Slide 9

Slide 9 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Lens 2: Scalable Streaming Data Pipelines

Slide 10

Slide 10 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka

Slide 11

Slide 11 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 The Streaming Platform

Slide 12

Slide 12 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 The Streaming Platform Event-Driven Scalable Decoupled

Slide 13

Slide 13 text

“ @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Bold claim: all your data is event streams

Slide 14

Slide 14 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 A Customer Experience

Slide 15

Slide 15 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 A Sale

Slide 16

Slide 16 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 A Sensor Reading

Slide 17

Slide 17 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 An Application Log Entry

Slide 18

Slide 18 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Databases

Slide 19

Slide 19 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Do you think that’s a table you are querying?

Slide 20

Slide 20 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 The Stream-Table Duality

Slide 21

Slide 21 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 Time The Stream-Table Duality

Slide 22

Slide 22 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 Account ID Balance 12345 €75 Time The Stream-Table Duality

Slide 23

Slide 23 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time The Stream-Table Duality

Slide 24

Slide 24 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table The Stream-Table Duality

Slide 25

Slide 25 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash

Slide 26

Slide 26 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 27

Slide 27 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 28

Slide 28 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 29

Slide 29 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 30

Slide 30 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 31

Slide 31 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 32

Slide 32 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Driven architectures in action…

Slide 33

Slide 33 text

“ @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 A Brief Look at Kafka's Technology

Slide 34

Slide 34 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 22 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log. Publish and subscribe to 
 streams of records. Highly scalable, high throughput. 
 Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.

Slide 35

Slide 35 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 23 Apache Kafka Orders Table Customers Kafka Streams API Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required. Kafka Streams API Write standard Java applications & microservices
 to process your data in real-time

Slide 36

Slide 36 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 KSQL is the Streaming SQL Engine for Apache Kafka

Slide 37

Slide 37 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 KSQL for Real-Time Monitoring 25 • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting

Slide 38

Slide 38 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 KSQL for Streaming ETL 26 CREATE STREAM platinum_customer_ratings AS 
 SELECT r.message, r.rating, c.customer_name, c.level FROM ratings r LEFT JOIN customers c ON r.userid = c.id 
 WHERE c.level = 'Platinum'; Joining, filtering, and aggregating streams of event data

Slide 39

Slide 39 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 27 MySQL Debezium Kafka Connect Producer API Elasticsearch Kafka Connect Streaming ETL with Apache Kafka and KSQL

Slide 40

Slide 40 text

“ @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 What Problems does Kafka Solve?

Slide 41

Slide 41 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app

Slide 42

Slide 42 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app mobile app APIs

Slide 43

Slide 43 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 mobile app web app APIs Streaming Platform Hadoop Security Monitoring Rec engine “A product was viewed” Event-Centric Thinking

Slide 44

Slide 44 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer System Availability and Event Buffering

Slide 45

Slide 45 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer System Availability and Event Buffering

Slide 46

Slide 46 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream

Slide 47

Slide 47 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 48

Slide 48 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 49

Slide 49 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 50

Slide 50 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 51

Slide 51 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream

Slide 52

Slide 52 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (v1)

Slide 53

Slide 53 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (v1) Consumer (V2)

Slide 54

Slide 54 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Technology & Code/Algo Version Changes Producer Consumer (V2)

Slide 55

Slide 55 text

“ @rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Architectural Patterns with Apache Kafka

Slide 56

Slide 56 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Photo by Christopher Burns on Unsplash Building for the Future

Slide 57

Slide 57 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 45 Tightly-coupled = Inflexible

Slide 58

Slide 58 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 46 Database offload→Hadoop/Object Storage/Cloud DW for Analytics HDFS / S3 / BigQuery etc RDBMS CDC

Slide 59

Slide 59 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 47 Streaming ETL with Apache Kafka and KSQL order items customer customer orders Stream Processing RDBMS CDC

Slide 60

Slide 60 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 48 Real-time Event Stream Enrichment with Apache Kafka and KSQL order events customer Stream Processing customer orders RDBMS CDC

Slide 61

Slide 61 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 49 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS New App CDC

Slide 62

Slide 62 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 50 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS HDFS / S3 / etc New App CDC

Slide 63

Slide 63 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 51 Drive new realtime applications using data from existing systems Existing App New App New App New App New App

Slide 64

Slide 64 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 52 Evolve processing from old systems to new RDBMS Existing App CDC

Slide 65

Slide 65 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 52 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App

Slide 66

Slide 66 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 53 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App New App CDC

Slide 67

Slide 67 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free! Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL

Slide 68

Slide 68 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle

Slide 69

Slide 69 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 @rmoff [email protected] https://slackpass.io/confluentcommunity https://www.confluent.io/download/

Slide 70

Slide 70 text

@rmoff / What is Apache Kafka, and What is a Streaming Platform? / Budapest Data Forum, June 2018 #EOF