Slide 1

Slide 1 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 1 Apache Kafka's Role in Modern Data Architectures Embrace the Anarchy : Robin Moffatt / Confluent Photo by Jaak Horn on Unsplash

Slide 2

Slide 2 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 2 • Developer Advocate @ Confluent • Working in data & analytics since 2001 • Oracle Developer Champion • Blogging : http://rmoff.net & http://cnfl.io/rmoff • Twitter: @rmoff • Geek stuff • Beer & Fried Breakfasts $ whoami https://speakerdeck.com/rmoff/

Slide 3

Slide 3 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform

Slide 4

Slide 4 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Why do we need a streaming platform?

Slide 5

Slide 5 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures One of the reasons: Decoupling

Slide 6

Slide 6 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures A case in point…Analytics

Slide 7

Slide 7 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 7 Sales DWH Analytics—In the beginning…

Slide 8

Slide 8 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 8 Sales DWH Inventory And then there were more data sources…

Slide 9

Slide 9 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 9 Sales DWH Inventory Batch Transformations … (ETL / ELT)

Slide 10

Slide 10 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 10 Sales DWH Inventory Data Lake Add a Data Lake…

Slide 11

Slide 11 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 11 Sales Inventory Data Lake …or Replace the Data Warehouse

Slide 12

Slide 12 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 12 Sales Inventory Data Lake Still need to do Batch transformations…

Slide 13

Slide 13 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 13 Want your data anytime ? Batch is Latency built in by Design

Slide 14

Slide 14 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 14 Photo by Denys Nevozhai on Unsplash Microservices Mobile Machine 
 Learning Internet of 
 Things The World has Changed

Slide 15

Slide 15 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 15 Photo by Rosie Fraser on Unsplash Lots of new technologies (whether you like it or not)

Slide 16

Slide 16 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 16 App App App App search Hadoop DWH monitoring security MQ MQ cache cache

Slide 17

Slide 17 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 17 KAFKA DWH Hadoop App App App App App App App App request-response messaging OR stream processing streaming data pipelines changelogs

Slide 18

Slide 18 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Apache Kafka is a Streaming Platform

Slide 19

Slide 19 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Three Lenses 19

Slide 20

Slide 20 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 01 Messaging Done Right 02 Scalable Streaming 
 Data Pipelines 03 Foundation for 
 Stream Processing 20 What is Apache Kafka?

Slide 21

Slide 21 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Scalability True Storage Real-Time Processing 21 Lens 1: Messaging Done Right

Slide 22

Slide 22 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 22 Lens 2: Scalable Streaming Data Pipelines

Slide 23

Slide 23 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Lens 3: Foundation for Stream Processing KSQL is the Streaming SQL Engine for Apache Kafka 23

Slide 24

Slide 24 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 24 The Streaming Platform

Slide 25

Slide 25 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 25 The Streaming Platform Event-Driven Scalable Decoupled

Slide 26

Slide 26 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Bold claim: all your data is event streams

Slide 27

Slide 27 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 27 A Customer Experience

Slide 28

Slide 28 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 28 A Sale

Slide 29

Slide 29 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 29 A Sensor Reading

Slide 30

Slide 30 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 30 An Application Log Entry

Slide 31

Slide 31 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 31 Databases

Slide 32

Slide 32 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 32 Do you think that’s a table you are querying?

Slide 33

Slide 33 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 33 The Table Stream Duality Account ID Balance 12345 €50 Account ID Amount 12345 + €50 12345 + €25 12345 -€60 Account ID Balance 12345 €75 Account ID Balance 12345 €15 Time Stream Table

Slide 34

Slide 34 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 34 The truth is the log. The database is a cache of a subset of the log. —Pat Helland Immutability Changes Everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf Photo by Bobby Burch on Unsplash

Slide 35

Slide 35 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures A Brief Look at Kafka's Technology

Slide 36

Slide 36 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 36 Apache Kafka Reads are a single seek & scan Writes are append only Kafka A Distributed Commit Log. Publish and subscribe to 
 streams of records. Highly scalable, high throughput. 
 Supports transactions. Persisted data. Stream processing. Producer & Consumer APIs Open-source client libraries for numerous languages, to directly integrate with your applications.

Slide 37

Slide 37 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 37 Apache Kafka Orders Table Customers Kafka Streams API Kafka Connect API Reliable and scalable integration of Kafka with other systems – no coding required. Kafka Streams API Write standard Java applications & microservices
 to process your data in real-time

Slide 38

Slide 38 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Declarative Stream Language Processing KSQL is a

Slide 39

Slide 39 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures KSQL is the Streaming SQL Engine for Apache Kafka

Slide 40

Slide 40 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 40 KSQL in Development and Production Interactive KSQL
 for development and testing Headless KSQL
 for Production Desired KSQL queries have been identified REST “Hmm, let me try
 out this idea...”

Slide 41

Slide 41 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 41 • Log data monitoring, tracking and alerting • syslog data • Sensor / IoT data CREATE STREAM SYSLOG_INVALID_USERS AS SELECT HOST, MESSAGE FROM SYSLOG WHERE MESSAGE LIKE '%Invalid user%'; http://cnfl.io/syslogs-filtering / http://cnfl.io/syslog-alerting KSQL for Real-Time Monitoring

Slide 42

Slide 42 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 42 CREATE TABLE possible_fraud AS
 SELECT card_number, count(*)
 FROM authorization_attempts 
 WINDOW TUMBLING (SIZE 5 SECONDS)
 GROUP BY card_number
 HAVING count(*) > 3; Identifying patterns or anomalies in real-time data, surfaced in milliseconds KSQL for Anomaly Detection

Slide 43

Slide 43 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 43 CREATE STREAM vip_actions AS 
 SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id 
 WHERE u.level = 'Platinum'; Joining, filtering, and aggregating streams of event data KSQL for Streaming ETL

Slide 44

Slide 44 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures What Problems does Kafka Solve?

Slide 45

Slide 45 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 45 Streaming Platform “A product was viewed” Hadoop Web app Event-Centric Thinking

Slide 46

Slide 46 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 46 Event-Centric Thinking Streaming Platform “A product was viewed” Hadoop Web app mobile app APIs

Slide 47

Slide 47 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 47 Event-Centric Thinking mobile app web app APIs Streaming Platform Hadoop Security Monitoring Rec engine “A product was viewed”

Slide 48

Slide 48 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 48 Producer Consumer System Availability and Event Buffering

Slide 49

Slide 49 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 49 Producer Consumer System Availability and Event Buffering

Slide 50

Slide 50 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 50 Consumer A Producer 24hr batch extract Varying Latency Requirements / Batch vs Stream

Slide 51

Slide 51 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 51 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 52

Slide 52 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 52 Producer 24hr batch extract Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 53

Slide 53 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 53 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 54

Slide 54 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 54 Producer 24hr batch extract Realtime Consumer A Consumer B Varying Latency Requirements / Batch vs Stream

Slide 55

Slide 55 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 55 Producer Consumer A 24hr batch extract Realtime Realtime Consumer B Varying Latency Requirements / Batch vs Stream

Slide 56

Slide 56 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 56 Technology & Code/Algo Version Changes Producer Consumer (v1)

Slide 57

Slide 57 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 57 Technology & Code/Algo Version Changes Producer Consumer (v1) Consumer (V2)

Slide 58

Slide 58 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 58 Technology & Code/Algo Version Changes Producer Consumer (V2)

Slide 59

Slide 59 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Architectural Patterns with Apache Kafka

Slide 60

Slide 60 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 60 Photo by Christopher Burns on Unsplash Building for the Future

Slide 61

Slide 61 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 61 Tightly-coupled = Inflexible

Slide 62

Slide 62 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 62 Analytics - Database Offload HDFS / S3 / BigQuery etc RDBMS CDC

Slide 63

Slide 63 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 63 Stream Processing with Apache Kafka and KSQL order events customer customer orders Stream Processing RDBMS CDC

Slide 64

Slide 64 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 64 Real-time Event Stream Enrichment order events customer Stream Processing customer orders RDBMS CDC

Slide 65

Slide 65 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 65 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS New App CDC

Slide 66

Slide 66 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 66 Transform Once, Use Many order events customer Stream Processing customer orders RDBMS HDFS / S3 / etc New App CDC

Slide 67

Slide 67 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 67 Evolve processing from old systems to new Stream Processing RDBMS Existing App CDC New App

Slide 68

Slide 68 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 68 Evolve processing from old systems to new Stream Processing RDBMS Existing App New App New App CDC

Slide 69

Slide 69 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 69 Want your data anytime ? Batch is Latency built in by Design You say that like "latency" is a synonym for "evil"

Slide 70

Slide 70 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 70 It's all about the Events!

Slide 71

Slide 71 text

“ @rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures So…Analytics and Kafka

Slide 72

Slide 72 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 72 The Vision! "One version of the truth"

Slide 73

Slide 73 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 73 The Reality…

Slide 74

Slide 74 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 74 Pragmatism is… "One version of the truth"

Slide 75

Slide 75 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 75 Streaming Platform Stream Processing "One version of the truth"

Slide 76

Slide 76 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 76 Streaming Platform M L App NoSQL Search Graph Stream Processing "One version of the truth"

Slide 77

Slide 77 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures Database Changes Log Events loT Data Web Events … CRM Data Warehouse Database Hadoop Data
 Integration … Monitoring Analytics Custom Apps Transformations Real-time Applications … Apache Open Source Confluent Open Source Confluent Enterprise Confluent Platform Confluent Platform Apache Kafka® Core | Connect API | Streams API Data Compatibility Schema Registry Monitoring & Administration Confluent Control Center | Security Operations Replicator | Auto Data Balancing Development and Connectivity Clients | Connectors | REST Proxy | CLI Apache Open Source Confluent Open Source Confluent Enterprise SQL Stream Processing KSQL 77 Confluent Open Source : Apache Kafka with a bunch of cool stuff! For free!

Slide 78

Slide 78 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 78 Free Books! https://www.confluent.io/apache-kafka-stream-processing-book-bundle

Slide 79

Slide 79 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 79 Confluent Streaming Event, Munich http://cnfl.io/streaming-event-munich

Slide 80

Slide 80 text

@rmoff robin@confluent.io https://www.confluent.io/download/ http://cnfl.io/slack

Slide 81

Slide 81 text

@rmoff / Embrace the Anarchy—Apache Kafka's Role in Modern Data Architectures 81 • CDC Spreadsheet • Blog: No More Silos: How to Integrate your Databases with Apache Kafka and CDC • #partner-engineering on Slack for questions • BD team (#partners / partners@confluent.io) can help with introductions on a given sales op Resources #EOF