of data. How can one predict the future and detect anomalies in an online setting on a large set of time series. A.I. Meetup Stuttgart #3 - 26.10.17 Dr. Simon Mueller @S_P__M - Joachim Rosskopf - @jrosskopf
in mathematics and has worked as a post doc and contractor in practical data science and machine learning. • Joachim sskopf is currently struggling to finish his PhD in theo. physics, mainly doing data analysis and optimization algorithms there. To earn some money he works as consultant for data science & engineering. • We both met at the AI Meetup some month ago, and realized, that we both have the same ideas, so we joined forces. 3
and Microsoft. Normalized on entry price in date range. • A sequence of data points indexed by a time dimension. • In most cases the sequence is discrete sampled at equally spaced points in time. • Common time series consist of real-valued univariate dataset. But also multivariate series or series of categorical data.
IoT Interaction Inventory Ad Type Medium Campaign Device Category Cohort Country Region Device Type Action Location Customer Replenish. Time Class. Order Point Material Turnover Unfolding dimension in a traditional data warehouse leads to a multitude of time series of the respective measures. Predicting them is of great value!
LoB and IoT systems produce a lot of time series. • People spend a lot of time monitoring, interpreting and predicting time-series. • But doing that for a large scale of series, in a timely fashion is laborious and error prone. • It get’s even more challenging, if one ones to get to the root cause of an anomaly, or predict on a set of correlated or multivariate series.
2 3 4 5 1 2 3 4 5 Recurrent Neural Networks (LSTM) Hierarchical Bayesian Models Autoregressive Model with Integrated Moving Average (ARIMA) Exponential Models Conformal k-NN Anomaly Detector No single algorithms is able to work on all series equally. A challenge is to do the right preprocessing and algorithm selection. 6 Hierarchical Temporal Memory (HTM) 6
Conformal k-NN Anomaly Detector for univariate Data Streams, arXiv:1706.03412 Conformal k-NN Anomaly Detector is a distribution-free and performant algorithm for anomaly detection in time series. Training Calibration Test Mean Distance Calculation Generate Calibration Set Scoring where
R. (1987), ‘Generalized additive models: some applications’, Journal of the American Statistical Association 82(398), 371–386. Bayesian inference bridges the gap between white-box model introspection and black-box predictive performance. Growth: Seasonality: Holidays:
additive model “Day of week” seasonality of the additive model “Day of year” seasonality of the additive model Fit and prediction produced by the posterior additive model Facebook activity per weekday 1 2 3 4 5
Event Time Source Source Source Algo. fit Algo. predict State Algo. fit Algo. predict State Algo. fit Algo. transform Sink Sink Sink • Fault tolerant • Exactly once • Event time based • Stateful • Distributed • Parallel
and time-series prediction on business and IoT data. • Quickly deployable as building block into the virtual private clouds of customers. We come, where your data & processing happens! • We rely on cloud services, open source software and modern data science methods. At its core we rely on battle tested data analysis. For higher level intelligence we utilize state of the art machine learning research.
which is especially interesting for business and IoT applications. • There exist powerful algorithms to detect anomalies or predict future data points in an unsupervised setting. • As a lot of time series arrive continuously. Therefore batch processing diminishes benefit, or is even prohibitive for some application. • Spark Streaming, Open Source and the Cloud are a decent environment for building streaming anomaly detection and prediction applications. • If you want details, examples or see some math or code feel free to reach us after the talk or via email/twitter. 14