Slide 1

Slide 1 text

@gamussa @confluentinc @thephillyjug Divide, Distribute and Conquer:
 Stream v. Batch

Slide 2

Slide 2 text

Stream v. Batch

Slide 3

Slide 3 text

Who am I?

Slide 4

Slide 4 text

Solutions Architect Who am I?

Slide 5

Slide 5 text

Solutions Architect Developer Advocate Who am I?

Slide 6

Slide 6 text

Solutions Architect Developer Advocate @gamussa in internetz Who am I?

Slide 7

Slide 7 text

Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes, you, go follow me in twitter © Who am I?

Slide 8

Slide 8 text

@gamussa @confluentinc @thephillyjug Disclaimer:
 


Slide 9

Slide 9 text

@gamussa @confluentinc @thephillyjug BATCH PROCESSING Data at rest

Slide 10

Slide 10 text

@gamussa @confluentinc @thephillyjug Data and Queries Origin and processing

Slide 11

Slide 11 text

@gamussa @confluentinc @thephillyjug

Slide 12

Slide 12 text

@gamussa @confluentinc @thephillyjug Data…

Slide 13

Slide 13 text

@gamussa @confluentinc @thephillyjug Data…

Slide 14

Slide 14 text

@gamussa @confluentinc @thephillyjug ✓ … inherently immutable Data… ✓ … time-based

Slide 15

Slide 15 text

@gamussa @confluentinc @thephillyjug CRUD -> CR

Slide 16

Slide 16 text

@gamussa @confluentinc @thephillyjug Processing is a query

Slide 17

Slide 17 text

@gamussa @confluentinc @thephillyjug Processing is a query Function on full data set

Slide 18

Slide 18 text

@gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection

Slide 19

Slide 19 text

@gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection Aggregations

Slide 20

Slide 20 text

@gamussa @confluentinc @thephillyjug Processing is a query Function on full data set Projection Aggregations Joins

Slide 21

Slide 21 text

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/07/2017" GROUP BY user_vote;

Slide 22

Slide 22 text

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN "04/7/2017" AND "04/08/2017" GROUP BY user_vote;

Slide 23

Slide 23 text

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/08/2007" GROUP BY user_vote;

Slide 24

Slide 24 text

@gamussa @confluentinc @thephillyjug Lambda architecture origins http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

@gamussa @confluentinc @thephillyjug Lambda Architecture

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

@gamussa @confluentinc @thephillyjug TFW Trying to explain modern big data landscape

Slide 29

Slide 29 text

@gamussa @confluentinc @thephillyjug Precomputed Results http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

Slide 30

Slide 30 text

@gamussa @confluentinc @thephillyjug Batch Process http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

Slide 31

Slide 31 text

@gamussa @confluentinc @thephillyjug STREAM PROCESSING Data is motion

Slide 32

Slide 32 text

@gamussa @confluentinc @thephillyjug Streaming Platform

Slide 33

Slide 33 text

@gamussa @confluentinc @thephillyjug Streaming Platform

Slide 34

Slide 34 text

@gamussa @confluentinc @thephillyjug Directed Acyclic Graph

Slide 35

Slide 35 text

@gamussa @confluentinc @thephillyjug DEMO

Slide 36

Slide 36 text

@gamussa @confluentinc @thephillyjug DEMO

Slide 37

Slide 37 text

@gamussa @confluentinc @thephillyjug Interesting cases Before You Go

Slide 38

Slide 38 text

I FOUND YOUR LACK OF FAULT TOLERANCE DISTURBING

Slide 39

Slide 39 text

Data is too important to store it in one computer

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

@gamussa @confluentinc @thephillyjug How to process «infinite» data?

Slide 45

Slide 45 text

@gamussa @confluentinc @thephillyjug Time model

Slide 46

Slide 46 text

@gamussa @confluentinc @thephillyjug Time model Different use cases time semantics

Slide 47

Slide 47 text

@gamussa @confluentinc @thephillyjug Time model Different use cases time semantics Majority of use cases require event- time semantics

Slide 48

Slide 48 text

@gamussa @confluentinc @thephillyjug Time model Different use cases time semantics Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time

Slide 49

Slide 49 text

@gamussa @confluentinc @thephillyjug Time Model

Slide 50

Slide 50 text

@gamussa @confluentinc @thephillyjug Time Model

Slide 51

Slide 51 text

@gamussa @confluentinc @thephillyjug Time Model

Slide 52

Slide 52 text

Finite Representation Of Infinite Data

Slide 53

Slide 53 text

@gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups events

Slide 54

Slide 54 text

@gamussa @confluentinc @thephillyjug https:/ /www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

Slide 55

Slide 55 text

@gamussa @confluentinc @thephillyjug Windowing Input data, where colors represent
 different users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave

Slide 56

Slide 56 text

@gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions

Slide 57

Slide 57 text

@gamussa @confluentinc @thephillyjug Fatality

Slide 58

Slide 58 text

@gamussa @confluentinc @thephillyjug Out-of-order and late data Is very common in practice, not a rare corner case ✗Related to time model discussion

Slide 59

Slide 59 text

@gamussa @confluentinc @thephillyjug Out-of-order and late data

Slide 60

Slide 60 text

@gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity

Slide 61

Slide 61 text

@gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight

Slide 62

Slide 62 text

@gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now

Slide 63

Slide 63 text

@gamussa @confluentinc @thephillyjug Stream Processing: results

Slide 64

Slide 64 text

@gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time

Slide 65

Slide 65 text

@gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet

Slide 66

Slide 66 text

@gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait

Slide 67

Slide 67 text

@gamussa @confluentinc @thephillyjug https://github.com/confluentinc/kafka-streams-examples

Slide 68

Slide 68 text

@gamussa @confluentinc @thephillyjug Thanks! questions? @gamussa [email protected]