[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch
Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses, planes, posts to social media, digital pictures and video and so on....
@gamussa @confluentinc @DataSciCon Time model Different use cases time semantics Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time
@gamussa @confluentinc @DataSciCon Windowing Input data, where colors represent different users events Rectangles denote different event-time windows processing-time event-time windowing alice bob dave
@gamussa @confluentinc @DataSciCon Windowing Windowing is an operation that groups events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
@gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 10h flight
@gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile phones enter airplane, lose Internet connectivity Emails are being written during the 10h flight Internet connectivity is restored, phones will send queued emails now
@gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet
@gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait