Slide 1

Slide 1 text

DIVIDE, DISTRIBUTE AND CONQUER:
 STREAM V. BATCH

Slide 2

Slide 2 text

Stream v. Batch

Slide 3

Slide 3 text

Who am I?

Slide 4

Slide 4 text

Solutions Architect Who am I?

Slide 5

Slide 5 text

Solutions Architect Developer Advocate Who am I?

Slide 6

Slide 6 text

Solutions Architect Developer Advocate @gamussa in internetz Who am I?

Slide 7

Slide 7 text

Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes, you, go follow me in twitter © Who am I?

Slide 8

Slide 8 text

@gamussa @confluentinc @DataSciCon BATCH PROCESSING Data at rest

Slide 9

Slide 9 text

@gamussa @confluentinc @DataSciCon Data and Queries Origin and processing

Slide 10

Slide 10 text

@gamussa @confluentinc @DataSciCon

Slide 11

Slide 11 text

@gamussa @confluentinc @DataSciCon Data…

Slide 12

Slide 12 text

@gamussa @confluentinc @DataSciCon Data…

Slide 13

Slide 13 text

@gamussa @confluentinc @DataSciCon ✓ … inherently immutable Data… ✓ … time-based

Slide 14

Slide 14 text

@gamussa @confluentinc @DataSciCon CRUD -> CR

Slide 15

Slide 15 text

@gamussa @confluentinc @DataSciCon Processing is a query

Slide 16

Slide 16 text

@gamussa @confluentinc @DataSciCon Processing is a query Function on full data set

Slide 17

Slide 17 text

@gamussa @confluentinc @DataSciCon Processing is a query Function on full data set Projection

Slide 18

Slide 18 text

@gamussa @confluentinc @DataSciCon Processing is a query Function on full data set Projection Aggregations

Slide 19

Slide 19 text

@gamussa @confluentinc @DataSciCon Processing is a query Function on full data set Projection Aggregations Joins

Slide 20

Slide 20 text

@gamussa @confluentinc @DataSciCon Lambda architecture origins http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

@gamussa @confluentinc @DataSciCon https://mapr.com/developercentral/lambda-architecture/ Lambda Architecture

Slide 24

Slide 24 text

@gamussa @confluentinc @DataSciCon

Slide 25

Slide 25 text

@gamussa @confluentinc @DataSciCon TFW Trying to explain modern big data landscape

Slide 26

Slide 26 text

@gamussa @confluentinc @DataSciCon

Slide 27

Slide 27 text

@gamussa @confluentinc @DataSciCon STREAM PROCESSING Data is motion

Slide 28

Slide 28 text

@gamussa @confluentinc @DataSciCon Streaming Platform

Slide 29

Slide 29 text

@gamussa @confluentinc @DataSciCon Streaming Platform

Slide 30

Slide 30 text

@gamussa @confluentinc @DataSciCon

Slide 31

Slide 31 text

@gamussa @confluentinc @DataSciCon Interesting cases Before You Go

Slide 32

Slide 32 text

I FOUND YOUR LACK OF FAULT TOLERANCE DISTURBING

Slide 33

Slide 33 text

Data is too important to store it in one computer

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

No content

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

@gamussa @confluentinc @DataSciCon How to process «infinite» data?

Slide 39

Slide 39 text

@gamussa @confluentinc @DataSciCon Time model

Slide 40

Slide 40 text

@gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

Slide 41

Slide 41 text

@gamussa @confluentinc @DataSciCon Time model Different use cases time semantics Majority of use cases require event- time semantics

Slide 42

Slide 42 text

@gamussa @confluentinc @DataSciCon Time model Different use cases time semantics Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time

Slide 43

Slide 43 text

@gamussa @confluentinc @DataSciCon Time Model

Slide 44

Slide 44 text

@gamussa @confluentinc @DataSciCon Time Model

Slide 45

Slide 45 text

@gamussa @confluentinc @DataSciCon Time Model

Slide 46

Slide 46 text

@gamussa @confluentinc @DataSciCon Windowing Input data, where colors represent
 different users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave

Slide 47

Slide 47 text

@gamussa @confluentinc @DataSciCon https:/ /www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

Slide 48

Slide 48 text

@gamussa @confluentinc @DataSciCon Windowing Windowing is an operation that groups events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions

Slide 49

Slide 49 text

@gamussa @confluentinc @DataSciCon Out-of-order and late data Is very common in practice, not a rare corner case ✗Related to time model discussion

Slide 50

Slide 50 text

@gamussa @confluentinc @DataSciCon Out-of-order and late data

Slide 51

Slide 51 text

@gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity

Slide 52

Slide 52 text

@gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight

Slide 53

Slide 53 text

@gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now

Slide 54

Slide 54 text

@gamussa @confluentinc @DataSciCon Stream Processing: results

Slide 55

Slide 55 text

@gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible to get computation results in real time

Slide 56

Slide 56 text

@gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet

Slide 57

Slide 57 text

@gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait

Slide 58

Slide 58 text

@gamussa @confluentinc @DataSciCon DEMO Let’s analyze flights

Slide 59

Slide 59 text

@gamussa @confluentinc @DataSciCon https://www.confluent.io/blog/predicting-flight-arrivals-with-the-apache-kafka-streams-api/

Slide 60

Slide 60 text

@gamussa @confluentinc @DataSciCon Example: Training Flight Prediction Model

Slide 61

Slide 61 text

@gamussa @confluentinc @DataSciCon https://github.com/confluentinc/online-inferencing-blog- application

Slide 62

Slide 62 text

@gamussa @confluentinc @DataSciCon Thanks! questions? @gamussa [email protected]