Upgrade to Pro — share decks privately, control downloads, hide ads and more …

To Stream or Not To Stream? The Landscape of Online Analytics

To Stream or Not To Stream? The Landscape of Online Analytics

EVAM Solution Day 2015, Istanbul.

More Decks by Gianmarco De Francisci Morales

Other Decks in Research


  1. To Stream or Not To Stream? The Landscape of Online

    Analytics Gianmarco De Francisci Morales
  2. None
  3. 5 Questions What? Why? How? Where? When?

  4. What? Data Stream

  5. Text Big Data Too big to handle

  6. Text Big Data Streams Drinking from a firehose

  7. Stream Analytics Batch data = snapshot of streaming data Descriptive

    Predictive Prescriptive
  8. Value of Data

  9. Online vs Real-Time

  10. Why? Motivation and Goal

  11. –Jay Kreps, Confluent founder (ex-LinkedIn) “Most of what happens inside

    a company is some 
 new information comes in and the company reacts 
 to that asynchronously.” Asynchronous Processing
  12. Nervous System vs Silos

  13. Perishable Insights Great instantaneous value Ephemeral Opportunity cost

  14. Hype Cycle

  15. Hype Cycle

  16. How? Stream Processing Architecture

  17. Architecture Overview

  18. Ingestion Plethora of solutions Still ad-hoc
 (read: messy) Schema evolution:

    Avro Column-store: Parquet Log collection: Flume
  19. Brokerage

  20. Processing PE PE Input Stream PEI PEI PEI PEI PEI

    Output Stream Event routing
  21. Output Stream: Kafka Further processing View: Key-Value Store Applications Reactive

  22. Example: Reactive Web App

  23. Lambda vs Kappa

  24. Where? Applications

  25. Application Domains Industrial applications Telecommunications and networks Web applications Internet

    of Things
  26. Predictive Maintenance

  27. Text Search

  28. Machine Learning SA SAMOA%

  29. Anomaly Detection

  30. When? Adoption Risks

  31. – Gartner, 2015 “Despite considerable hype and reported successes for

    early adopters, 54% of survey respondents report no plans to invest at this time, while only 18% have plans to invest in Hadoop over the next 2 years.” 5 Years Early
  32. Cost Not an issue Cheap hardware Cloud-based solutions Amazon Kinesis,

    MSFT Azure Stream Analytics Open source
  33. Ease Inherently harder Ops best practices not ironed out (yet)

    Lack of skills, training, and support Rethink applications from scratch
  34. Actionable Insights? Define what you want Moving target Garbage in,

    garbage out
  35. Conclusions Who?

  36. 5 Answers What? Why? How? Where? When? Stream analytics Perishable

    insights Asynchronous processing Everywhere In 5 years
  37. Text Slow Fish or Fast Fish Which fish will you

  38. Thanks! 37 https://samoa.incubator.apache.org @ApacheSAMOA @gdfm7 gdfm@acm.org