[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch

[DataSciCon] Divide, Distribute and Conquer: Stream v. Batch

Data is flowing everywhere around us, from phones, credit cards, sensor-equipped buildings, vending machines, thermostats, trains, buses, planes, posts to social media, digital pictures and video and so on....

http://www.datascicon.tech

0680be1c881abcf19219f09f1e8cf140?s=128

Viktor Gamov

November 30, 2017
Tweet

Transcript

  1. DIVIDE, DISTRIBUTE AND CONQUER:
 STREAM V. BATCH

  2. Stream v. Batch

  3. Who am I?

  4. Solutions Architect Who am I?

  5. Solutions Architect Developer Advocate Who am I?

  6. Solutions Architect Developer Advocate @gamussa in internetz Who am I?

  7. Solutions Architect Developer Advocate @gamussa in internetz Hey you, yes,

    you, go follow me in twitter © Who am I?
  8. @gamussa @confluentinc @DataSciCon BATCH PROCESSING Data at rest

  9. @gamussa @confluentinc @DataSciCon Data and Queries Origin and processing

  10. @gamussa @confluentinc @DataSciCon

  11. @gamussa @confluentinc @DataSciCon Data…

  12. @gamussa @confluentinc @DataSciCon Data…

  13. @gamussa @confluentinc @DataSciCon ✓ … inherently immutable Data… ✓ …

    time-based
  14. @gamussa @confluentinc @DataSciCon CRUD -> CR

  15. @gamussa @confluentinc @DataSciCon Processing is a query

  16. @gamussa @confluentinc @DataSciCon Processing is a query Function on full

    data set
  17. @gamussa @confluentinc @DataSciCon Processing is a query Function on full

    data set Projection
  18. @gamussa @confluentinc @DataSciCon Processing is a query Function on full

    data set Projection Aggregations
  19. @gamussa @confluentinc @DataSciCon Processing is a query Function on full

    data set Projection Aggregations Joins
  20. @gamussa @confluentinc @DataSciCon Lambda architecture origins http:/ /nathanmarz.com/blog/how-to-beat-the-cap-theorem.html

  21. None
  22. None
  23. @gamussa @confluentinc @DataSciCon https://mapr.com/developercentral/lambda-architecture/ Lambda Architecture

  24. @gamussa @confluentinc @DataSciCon

  25. @gamussa @confluentinc @DataSciCon TFW Trying to explain modern big data

    landscape
  26. @gamussa @confluentinc @DataSciCon

  27. @gamussa @confluentinc @DataSciCon STREAM PROCESSING Data is motion

  28. @gamussa @confluentinc @DataSciCon Streaming Platform

  29. @gamussa @confluentinc @DataSciCon Streaming Platform

  30. @gamussa @confluentinc @DataSciCon

  31. @gamussa @confluentinc @DataSciCon Interesting cases Before You Go

  32. I FOUND YOUR LACK OF FAULT TOLERANCE DISTURBING

  33. Data is too important to store it in one computer

  34. None
  35. None
  36. None
  37. None
  38. @gamussa @confluentinc @DataSciCon How to process «infinite» data?

  39. @gamussa @confluentinc @DataSciCon Time model

  40. @gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

  41. @gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

    Majority of use cases require event- time semantics
  42. @gamussa @confluentinc @DataSciCon Time model Different use cases time semantics

    Majority of use cases require event- time semantics Other use cases may require processing-time or special variants like ingestion-time
  43. @gamussa @confluentinc @DataSciCon Time Model

  44. @gamussa @confluentinc @DataSciCon Time Model

  45. @gamussa @confluentinc @DataSciCon Time Model

  46. @gamussa @confluentinc @DataSciCon Windowing Input data, where colors represent
 different

    users events Rectangles denote
 different event-time
 windows processing-time event-time windowing alice bob dave
  47. @gamussa @confluentinc @DataSciCon https:/ /www.oreilly.com/ideas/the-world-beyond-batch-streaming-101

  48. @gamussa @confluentinc @DataSciCon Windowing Windowing is an operation that groups

    events Most commonly needed: time windows, session windows Examples: ✗Real-time monitoring: 5-minute averages ✗Reader behavior on a website: user browsing sessions
  49. @gamussa @confluentinc @DataSciCon Out-of-order and late data Is very common

    in practice, not a rare corner case ✗Related to time model discussion
  50. @gamussa @confluentinc @DataSciCon Out-of-order and late data

  51. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity
  52. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight
  53. @gamussa @confluentinc @DataSciCon Out-of-order and late data Users with mobile

    phones enter
 airplane, lose Internet connectivity Emails are being written
 during the 10h flight Internet connectivity is restored,
 phones will send queued emails now
  54. @gamussa @confluentinc @DataSciCon Stream Processing: results

  55. @gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible

    to get computation results in real time
  56. @gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet
  57. @gamussa @confluentinc @DataSciCon Stream Processing: results • Yes, it’s possible

    to get computation results in real time • Windows – finite view of infinite data • Based on temporal characteristics of the evet • Late event processing • You choose how long to wait
  58. @gamussa @confluentinc @DataSciCon DEMO Let’s analyze flights

  59. @gamussa @confluentinc @DataSciCon https://www.confluent.io/blog/predicting-flight-arrivals-with-the-apache-kafka-streams-api/

  60. @gamussa @confluentinc @DataSciCon Example: Training Flight Prediction Model

  61. @gamussa @confluentinc @DataSciCon https://github.com/confluentinc/online-inferencing-blog- application

  62. @gamussa @confluentinc @DataSciCon Thanks! questions? @gamussa viktor@confluent.io