Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Streaming Time Series Analysis

Streaming Time Series Analysis

Time Series are an omnipresent type of data. How can one predict the future and detect anomalies in an online setting on a large set of time series.

Joachim Rosskopf

October 26, 2017
Tweet

More Decks by Joachim Rosskopf

Other Decks in Technology

Transcript

  1. Streaming Time Series Analysis Time Series are an omnipresent type

    of data. How can one predict the future and detect anomalies in an online setting on a large set of time series. A.I. Meetup Stuttgart #3 - 26.10.17 Dr. Simon Mueller @S_P__M - Joachim Rosskopf - @jrosskopf
  2. Who are we? • Dr. Simon ler has a PhD

    in mathematics and has worked as a post doc and contractor in practical data science and machine learning. • Joachim sskopf is currently struggling to finish his PhD in theo. physics, mainly doing data analysis and optimization algorithms there. To earn some money he works as consultant for data science & engineering. • We both met at the AI Meetup some month ago, and realized, that we both have the same ideas, so we joined forces. 3
  3. What are Time Series? 4 Stock prices of Apple, Google

    and Microsoft. Normalized on entry price in date range. • A sequence of data points indexed by a time dimension. • In most cases the sequence is discrete sampled at equally spaced points in time. • Common time series consist of real-valued univariate dataset. But also multivariate series or series of categorical data.
  4. Where do the Series stem from? 5 ECommerce Conversions Sales

    IoT Interaction Inventory Ad Type Medium Campaign Device Category Cohort Country Region Device Type Action Location Customer Replenish. Time Class. Order Point Material Turnover Unfolding dimension in a traditional data warehouse leads to a multitude of time series of the respective measures. Predicting them is of great value!
  5. What is the Data Problem? 6 • Due to combinatorics

    LoB and IoT systems produce a lot of time series. • People spend a lot of time monitoring, interpreting and predicting time-series. • But doing that for a large scale of series, in a timely fashion is laborious and error prone. • It get’s even more challenging, if one ones to get to the root cause of an anomaly, or predict on a set of correlated or multivariate series.
  6. Algorithms Overview 7 Training & Inference Complexity Model Expressiveness 1

    2 3 4 5 1 2 3 4 5 Recurrent Neural Networks (LSTM) Hierarchical Bayesian Models Autoregressive Model with Integrated Moving Average (ARIMA) Exponential Models Conformal k-NN Anomaly Detector No single algorithms is able to work on all series equally. A challenge is to do the right preprocessing and algorithm selection. 6 Hierarchical Temporal Memory (HTM) 6
  7. Conformal k-NN Anomaly Detector 8 V. Ishimtsev et al. (2017),

    Conformal k-NN Anomaly Detector for univariate Data Streams, arXiv:1706.03412 Conformal k-NN Anomaly Detector is a distribution-free and performant algorithm for anomaly detection in time series. Training Calibration Test Mean Distance Calculation Generate Calibration Set Scoring where
  8. Bayesian generalized additive model (GAM) 9 Hastie, T. & Tibshirani,

    R. (1987), ‘Generalized additive models: some applications’, Journal of the American Statistical Association 82(398), 371–386. Bayesian inference bridges the gap between white-box model introspection and black-box predictive performance. Growth: Seasonality: Holidays:
  9. Bayesian generalized additive model (GAM) 10 Trend component of the

    additive model “Day of week” seasonality of the additive model “Day of year” seasonality of the additive model Fit and prediction produced by the posterior additive model Facebook activity per weekday 1 2 3 4 5
  10. The Future of Data Analytics 11 Today Tomorrow Online Models

    Data Streams Actions & Events Data Batches Storage Business Intelligence Machine Learning Challenges • Talent shortage, not automated • Reaction is slow • Concept Drift (Model obsolence) Advantages • Automated model creation • Continuous learning • Real-time • Basis of higher level analytics, e.g. prediction
  11. Time Series Analysis as Online Learning Problem 12 Processing Time

    Event Time Source Source Source Algo. fit Algo. predict State Algo. fit Algo. predict State Algo. fit Algo. transform Sink Sink Sink • Fault tolerant • Exactly once • Event time based • Stateful • Distributed • Parallel
  12. Meet the AnoFox! 13 • Scalable, unsupervised, online anomaly detection

    and time-series prediction on business and IoT data. • Quickly deployable as building block into the virtual private clouds of customers. We come, where your data & processing happens! • We rely on cloud services, open source software and modern data science methods. At its core we rely on battle tested data analysis. For higher level intelligence we utilize state of the art machine learning research.
  13. Conclusions • Time Series are an omnipresent type of data,

    which is especially interesting for business and IoT applications. • There exist powerful algorithms to detect anomalies or predict future data points in an unsupervised setting. • As a lot of time series arrive continuously. Therefore batch processing diminishes benefit, or is even prohibitive for some application. • Spark Streaming, Open Source and the Cloud are a decent environment for building streaming anomaly detection and prediction applications. • If you want details, examples or see some math or code feel free to reach us after the talk or via email/twitter. 14
  14. Thank you for the opportunity to present our ideas! Streaming

    Time Series Analysis Simons Email: [email protected] Simons Twitter: @S_P__M Joachims Email: [email protected] Joachims Twitter: @jrosskopf A.I. Meetup Stuttgart #3 - 26.10.17