Slide 1

Slide 1 text

Understanding streaming data and analytics with apache kafka® @riferrei | @apachekafka | @elastic

Slide 2

Slide 2 text

About me @riferrei | @apachekafka | @elastic • RICARDO FERREIRA • Developer advocate • Elastic community team • Kafka summit pc member • [email protected][email protected]

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

@riferrei | @apachekafka | @elastic ”there were lots of databases and other systems built to store data, but what was missing in our architecture was something that would help us to handle continuous flows of data.” – jay kreps Origins of apache kafka

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

@riferrei | @apachekafka | @elastic Event-driven architecture Job change recommendation engine Search engine Email service

Slide 8

Slide 8 text

@riferrei | @apachekafka | @elastic SQL SQL SQL Recommendation engine Search engine Email service database LOG IMPLEMENT WITH a DATABASE

Slide 9

Slide 9 text

@riferrei | @apachekafka | @elastic Databases CAN’T handle events database 1000x more volume Non-transactional events Transactional events LOG

Slide 10

Slide 10 text

Databases 30 years ago...

Slide 11

Slide 11 text

Databases these days

Slide 12

Slide 12 text

@riferrei | @apachekafka | @elastic Databases are limited

Slide 13

Slide 13 text

Limited? Are you kidding me?

Slide 14

Slide 14 text

@riferrei | @apachekafka | @elastic ARE DATABASES LIMITED? YES THEY ARE. WHY DO WE HAVE TO MOVE DATA FROM ONE DB TO ANOTHER JUST for ANALYTICS?

Slide 15

Slide 15 text

@riferrei | @apachekafka | @elastic What then?

Slide 16

Slide 16 text

“The truth is the log. The database is a cache of a subset of the log.” — pat helland Immutability changes everything http://cidrdb.org/cidr2015/Papers/CIDR15_Paper16.pdf

Slide 17

Slide 17 text

@riferrei | @apachekafka | @elastic log as first-class citizen database LOG 0 1 2 3 4 5 6 7 8 LOG reads writes Destination System a (time = 1) Destination System b (time = 3)

Slide 18

Slide 18 text

@riferrei | @apachekafka | @elastic SOLUTION: BUILD A COMMIT LOG Commit LOG User tracking Historical data Operational metrics Nosql database Graph database Sql database ... HADOOP Elastic search grafana Machine learning REC. ENGINE SEARCH SECURITY EMAIL SOCIAL GRAPH microservices

Slide 19

Slide 19 text

@riferrei | @apachekafka | @elastic ”WE’VE COME TO THINK OF KAFKA AS A STREAMING PLATFORM: A SYSTEM THAT LETS YOU PUBLISH AND SUBSCRIBE TO STREAMS OF DATA, STORE THEM, AND PROCESS THEM, AND THAT IS EXACTLY WHAT APACHE KAFKA IS BUILT TO BE.” – jay kreps Origins of apache kafka

Slide 20

Slide 20 text

@riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency)

Slide 21

Slide 21 text

@riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Distributed Commit log

Slide 22

Slide 22 text

@riferrei | @apachekafka | @elastic ORIGINS OF APACHE KAFKA Databases Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Fast (Low Latency) Highly Scalable Durable Persistent Ordered Fast (Low Latency) Stream processing Continuous flows Scalable integration Distributed Streaming platform

Slide 23

Slide 23 text

@riferrei | @apachekafka | @elastic ”the ability to combine these three areas – to bring all the streams of data together across all the use cases – is what makes the idea of a streaming platform so appealing to people” – jay kreps Origins of apache kafka

Slide 24

Slide 24 text

@riferrei | @confluentinc | @itau

Slide 25

Slide 25 text

01 Data Streams with messaging 02 Data analytics with stream processing 03 Sophisticated STORAGE SYSTEM Distributed streaming platform

Slide 26

Slide 26 text

@riferrei | @apachekafka | @elastic Data streams With messaging

Slide 27

Slide 27 text

@riferrei | @apachekafka | @elastic producer Messaging as you know it consumer broker write push

Slide 28

Slide 28 text

@riferrei | @apachekafka | @elastic producer Kafka does messaging different consumer broker write pull

Slide 29

Slide 29 text

@riferrei | @apachekafka | @elastic Kafka does messaging different broker pull Group 1 Group 2 Group 3 pull pull queueing Pub/sub

Slide 30

Slide 30 text

@riferrei | @apachekafka | @elastic Kafka does messaging different 0 1 2 3 4 5 6 7 topic 0 1 2 3 Partition 1 4 5 6 7 Partition 2

Slide 31

Slide 31 text

@riferrei | @apachekafka | @elastic Kafka does messaging different 0 1 2 3 Partition 1 4 5 6 7 Partition 2 8 9 Partition 3 producer write consumer consumer consumer pull pull pull

Slide 32

Slide 32 text

@riferrei | @apachekafka | @elastic Kafka does messaging different 0 1 2 3 Partition 1 4 5 6 7 Partition 2 8 9 Partition 3 producer Key 002

Slide 33

Slide 33 text

@riferrei | @apachekafka | @elastic Kafka does messaging different producer write consumer pull Bytes serialize deserialize

Slide 34

Slide 34 text

@riferrei | @apachekafka | @elastic producer Kafka does messaging different broker write 250gb 250gb 500gb Data is always Persistent

Slide 35

Slide 35 text

@riferrei | @apachekafka | @elastic Data ANALYTICS WITH STREAM PROCESSING

Slide 36

Slide 36 text

@riferrei | @apachekafka | @elastic How to process data streams? consumer broker 1) pull number of records < 4 12 number of records > 5 9 3) write 2) process

Slide 37

Slide 37 text

@riferrei | @apachekafka | @elastic How to process data streams? consumer broker 1) pull 3) write What IF WE COULD HAVE A Processing LAYER FOR THE DATA STREAMS? number of records < 4 12 number of records > 5 9 2) process

Slide 38

Slide 38 text

@riferrei | @apachekafka | @elastic Using stream processors producer consumer broker write pull Stream processors

Slide 39

Slide 39 text

@riferrei | @apachekafka | @elastic Using stream processors Kafka streams

Slide 40

Slide 40 text

@riferrei | @apachekafka | @elastic Using stream processors ksqldb

Slide 41

Slide 41 text

@riferrei | @apachekafka | @elastic Scalable data integration broker Stream processors connectors

Slide 42

Slide 42 text

@riferrei | @apachekafka | @elastic sophisticated Storage system

Slide 43

Slide 43 text

@riferrei | @apachekafka | @elastic Kafka as a storage system Broker 1 250gb 250gb 500gb 1tb storage Broker 2 500gb 500gb 500gb 1.5tb storage Cluster storage → 2.5tb Elastic storage

Slide 44

Slide 44 text

@riferrei | @apachekafka | @elastic Kafka as a storage system Broker 1 250gb 250gb 500gb 1tb storage Broker 2 500gb 500gb 500gb 1.5tb storage Partition-level replication Partition 1 Partition 2 Partition 2

Slide 45

Slide 45 text

@riferrei | @apachekafka | @elastic Kafka as a storage system Commit LOG consumer Polling 100 records consumer Constant time performance Time spent: 1 MS Polling 100 records Time spent: 1 MS Commit LOG 5kb 5tb

Slide 46

Slide 46 text

@riferrei | @apachekafka | @elastic Kafka as a storage system Optimized for massive reads Broker 1 250gb 250gb 500gb 1tb storage pagecache nic consumer Kafka uses the sendfile api to: - Bypass pagecache to kernel space - Bypass kernel space to user buffer - Bypass user buffer to kernel space - Bypass kernel space to socket buffer Partition 1 Partition 2

Slide 47

Slide 47 text

@riferrei | @apachekafka | @elastic Kafka as a storage system File management in kafka Partition 0 Partition 1 Partition 2 Segment 0 Segment 1 + Segment 2 + 0000Seg1.log 0000Seg1.index

Slide 48

Slide 48 text

@riferrei | @apachekafka | @elastic Putting the Pieces together

Slide 49

Slide 49 text

@riferrei | @apachekafka | @elastic Streaming PAC-MAN

Slide 50

Slide 50 text

@riferrei | @apachekafka | @elastic STREAMING PAC-MAN Api gateway Lambda function Kafka (MSK) Ksqldb (ecs) Kafka (MSK) scoreboard https://github.com/riferrei/streaming-pacman-aws

Slide 51

Slide 51 text

@riferrei | @apachekafka | @elastic 2. Name yourself 1. Get the game Streaming pac-man

Slide 52

Slide 52 text

@riferrei | @apachekafka | @elastic Making data available

Slide 53

Slide 53 text

@riferrei | @apachekafka | @elastic Api gateway Lambda function scoreboard Redis cache push From kafka to the world

Slide 54

Slide 54 text

From kafka to the world @riferrei | @apachekafka | @elastic Amazon alexa Lambda function scoreboard Redis cache push

Slide 55

Slide 55 text

@riferrei | @apachekafka | @elastic Your code Ksqldb (ECS) pull Kafka (MSK) From kafka to the world

Slide 56

Slide 56 text

@riferrei | @apachekafka | @elastic how can I learn more?

Slide 57

Slide 57 text

@riferrei | @apachekafka | @elastic Use professional books

Slide 58

Slide 58 text

@riferrei | @apachekafka | @elastic Thank you