Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

Viktor Gamov
September 13, 2017

[Philly JUG] Divide, Distribute and Conquer: Stream v. Batch

Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses,planes, posts to social media, digital pictures and video and so on.Simple data collection is not enough anymore. Most of the current systems do data processing via nightly extract, transform, and load (ETL)operations, which is common in enterprise environments, requires decision makers to wait an entire day (or night) for reports to become available.

But businesses don’t want «Big Data» anymore. They want «Fast Data».What distinguishes a «streaming systems» from the batch systems is that the event stream is unbounded or “infinite” from a system perspective.

Decision-makers need to analyze these streaming events as a whole to make business decisions as new information arrives.In this talk, after a short introduction to common approaches and architectures (lambda, kappa), Viktor will demonstrate how to use open-source steam processing tools (Flink, Kafka Streams, Hazelcast Jet) for stream processing.

Viktor Gamov

September 13, 2017
Tweet

More Decks by Viktor Gamov

Other Decks in Programming

Transcript

  1. @gamussa @confluentinc @thephillyjug Time model Different use cases time semantics

    Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time
  2. @gamussa @confluentinc @thephillyjug Windowing Input data, where colors represent
 different

    users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave
  3. @gamussa @confluentinc @thephillyjug Windowing Windowing is an operation that groups

    events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
  4. @gamussa @confluentinc @thephillyjug Out-of-order and late data Is very common

    in practice, not a rare corner case ✗Related to time model discussion
  5. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity
  6. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight
  7. @gamussa @confluentinc @thephillyjug Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now
  8. @gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet
  9. @gamussa @confluentinc @thephillyjug Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait