$30 off During Our Annual Pro Sale. View Details »

To Stream or Not To Stream? The Landscape of Online Analytics

To Stream or Not To Stream? The Landscape of Online Analytics

EVAM Solution Day 2015, Istanbul.

More Decks by Gianmarco De Francisci Morales

Other Decks in Research

Transcript

  1. To Stream or Not To Stream?
    The Landscape of Online Analytics
    Gianmarco De Francisci Morales

    [email protected]
    @gdfm7

    View Slide

  2. View Slide

  3. 5 Questions
    What?
    Why?
    How?
    Where?
    When?

    View Slide

  4. What?
    Data Stream

    View Slide

  5. Text
    Big Data
    Too big to handle

    View Slide

  6. Text
    Big Data Streams
    Drinking from a firehose

    View Slide

  7. Stream Analytics
    Batch data = snapshot
    of streaming data
    Descriptive
    Predictive
    Prescriptive

    View Slide

  8. Value of Data

    View Slide

  9. Online vs Real-Time

    View Slide

  10. Why?
    Motivation and Goal

    View Slide

  11. –Jay Kreps, Confluent founder (ex-LinkedIn)
    “Most of what happens inside a company is some 

    new information comes in and the company reacts 

    to that asynchronously.”
    Asynchronous Processing

    View Slide

  12. Nervous System vs Silos

    View Slide

  13. Perishable Insights
    Great instantaneous value
    Ephemeral
    Opportunity cost

    View Slide

  14. Hype Cycle

    View Slide

  15. Hype Cycle

    View Slide

  16. How?
    Stream Processing Architecture

    View Slide

  17. Architecture Overview

    View Slide

  18. Ingestion
    Plethora of solutions
    Still ad-hoc

    (read: messy)
    Schema evolution: Avro
    Column-store: Parquet
    Log collection: Flume

    View Slide

  19. Brokerage

    View Slide

  20. Processing
    PE
    PE
    Input
    Stream PEI
    PEI
    PEI
    PEI
    PEI
    Output
    Stream
    Event
    routing

    View Slide

  21. Output
    Stream: Kafka
    Further processing
    View: Key-Value Store
    Applications
    Reactive callbacks

    View Slide

  22. Example: Reactive Web App

    View Slide

  23. Lambda vs Kappa

    View Slide

  24. Where?
    Applications

    View Slide

  25. Application Domains
    Industrial applications
    Telecommunications and networks
    Web applications
    Internet of Things

    View Slide

  26. Predictive Maintenance

    View Slide

  27. Text Search

    View Slide

  28. Machine Learning
    SA
    SAMOA%

    View Slide

  29. Anomaly Detection

    View Slide

  30. When?
    Adoption Risks

    View Slide

  31. – Gartner, 2015
    “Despite considerable hype and reported successes
    for early adopters, 54% of survey respondents report
    no plans to invest at this time, while only 18% have
    plans to invest in Hadoop over the next 2 years.”
    5 Years Early

    View Slide

  32. Cost
    Not an issue
    Cheap hardware
    Cloud-based solutions
    Amazon Kinesis, MSFT Azure Stream Analytics
    Open source

    View Slide

  33. Ease
    Inherently harder
    Ops best practices not ironed out (yet)
    Lack of skills, training, and support
    Rethink applications from scratch

    View Slide

  34. Actionable Insights?
    Define what you want
    Moving target
    Garbage in, garbage out

    View Slide

  35. Conclusions
    Who?

    View Slide

  36. 5 Answers
    What?
    Why?
    How?
    Where?
    When?
    Stream analytics
    Perishable insights
    Asynchronous processing
    Everywhere
    In 5 years

    View Slide

  37. Text
    Slow Fish or Fast Fish
    Which fish will you be?

    View Slide

  38. Thanks!
    37
    https://samoa.incubator.apache.org
    @ApacheSAMOA
    @gdfm7
    [email protected]

    View Slide