Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch

[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch

Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses, planes, posts to social media, digital pictures and video and so on....

http://www.datascicon.tech

Viktor Gamov

November 30, 2017
Tweet

More Decks by Viktor Gamov

Other Decks in Technology

Transcript

  1. @gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

    Majority of use cases require event- time semantics
  2. @gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

    Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time
  3. @gamussa @confluentinc @DataSciCon Windowing Input data, where colors represent
 different

    users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave
  4. @gamussa @confluentinc @DataSciCon Windowing Windowing is an operation that groups

    events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
  5. @gamussa @confluentinc @DataSciCon Out-of-order and late data Is very common

    in practice, not a rare corner case ✗Related to time model discussion
  6. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity
  7. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight
  8. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now
  9. @gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet
  10. @gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait