[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

@gamussa @confluentinc @thephillyjug Divide, Distribute and Conquer:  Stream v. Batch

Stream v. Batch

Who am I?

Solutions Architect Who am I?

Solutions Architect Developer Advocate Who am I?

Solutions Architect Developer Advocate @gamussa in internetz Who am I?

Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes,
you, go follow me in twitter © Who am I?

@gamussa @confluentinc @thephillyjug Disclaimer:   

@gamussa @confluentinc @thephillyjug BATCH PROCESSING Data at rest

@gamussa @confluentinc @thephillyjug Data and Queries Origin and processing

@gamussa @confluentinc @thephillyjug

@gamussa @confluentinc @thephillyjug Data…

@gamussa @confluentinc @thephillyjug ✓ … inherently immutable Data… ✓ …
time-based

@gamussa @confluentinc @thephillyjug CRUD -> CR

@gamussa @confluentinc @thephillyjug Processing is a query

@gamussa @confluentinc @thephillyjug Processing is a query Function on full
data set

data set Projection

data set Projection Aggregations

data set Projection Aggregations Joins

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/07/2017"
GROUP BY user_vote;

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN "04/7/2017" AND
"04/08/2017" GROUP BY user_vote;

SELECT user_vote, count(*) FROM AccessLog WHERE event_date BETWEEN"04/07/2017" AND "04/08/2007"
GROUP BY user_vote;

@gamussa @confluentinc @thephillyjug Lambda architecture origins http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

@gamussa @confluentinc @thephillyjug Lambda Architecture

@gamussa @confluentinc @thephillyjug TFW Trying to explain modern big data
landscape

@gamussa @confluentinc @thephillyjug Precomputed Results http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

@gamussa @confluentinc @thephillyjug Batch Process http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

@gamussa @confluentinc @thephillyjug STREAM PROCESSING Data is motion

@gamussa @confluentinc @thephillyjug Streaming Platform

@gamussa @confluentinc @thephillyjug Directed Acyclic Graph

@gamussa @confluentinc @thephillyjug DEMO

@gamussa @confluentinc @thephillyjug Interesting cases Before You Go

I FOUND YOUR LACK OF FAULT TOLERANCE DISTURBING

Data is too important to store it in one computer

@gamussa @confluentinc @thephillyjug How to process «infinite» data?

@gamussa @confluentinc @thephillyjug Time model

@gamussa @confluentinc @thephillyjug Time model Different use cases time semantics

Majority of use cases require event- time semantics

Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time

@gamussa @confluentinc @thephillyjug Time Model

Finite Representation Of Infinite Data

@gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups
events

@gamussa @confluentinc @thephillyjug https:/ /www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

@gamussa @confluentinc @thephillyjug Windowing Input data, where colors represent  different
users events Rectangles denote  different event-time  windows processing-time event-time windowing alice bob dave

@gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups
events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions

@gamussa @confluentinc @thephillyjug Fatality

@gamussa @confluentinc @thephillyjug Out-of-order and late data Is very common
in practice, not a rare corner case ✗Related to time model discussion

@gamussa @confluentinc @thephillyjug Out-of-order and late data

@gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile
phones enter  airplane, lose Internet connectivity

phones enter  airplane, lose Internet connectivity Emails are being written  during the 10h flight

phones enter  airplane, lose Internet connectivity Emails are being written  during the 10h flight Internet connectivity is restored,  phones will send queued emails now

@gamussa @confluentinc @thephillyjug Stream Processing: results

@gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible
to get computation results in real time

to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet

to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait

@gamussa @confluentinc @thephillyjug https://github.com/confluentinc/kafka-streams-examples

@gamussa @confluentinc @thephillyjug Thanks! questions? @gamussa [email protected]

[Philly JUG] Divide, Distribute and Conquer: St...

[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

More Decks by Viktor Gamov

Other Decks in Programming

Featured

Transcript