Time Series Data Mining Challenges

Time Series Data Mining Challenges

MACSPro'2019 - Modeling and Analysis of Complex Systems and Processes, Vienna
21 - 23 March 2019

Prof. Jose A. Lozano

Conference website http://macspro.club/

Website https://exactpro.com/
Linkedin https://www.linkedin.com/company/exactpro-systems-llc
Instagram https://www.instagram.com/exactpro/
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Youtube Channel https://www.youtube.com/c/exactprosystems

5206c19df417b8876825b5561344c1a0?s=128

Exactpro

March 22, 2019
Tweet

Transcript

  1. 1.

    Time Series Data Mining Challenges Time Series Data Mining Challenges

    Jose A. Lozano Basque Center for Applied Mathematics (BCAM) University of the Basque Country UPV/EHU MACsPro, Vienna, March 21-23, 2019
  2. 5.

    Time Series Data Mining Challenges Outline of the presentation 1

    Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  3. 6.

    Time Series Data Mining Challenges Time Series Data Mining Activities

    Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  4. 7.

    Time Series Data Mining Challenges Time Series Data Mining Activities

    Time series all around Temporal correlation High dimen- sionality Noisy Industry 4.0 Bio Signals Weather Forecasting Shapes
  5. 9.

    Time Series Data Mining Challenges Time Series Data Mining Activities

    Time series data Base: our object of study A set of time series (usually big) Different lengths Multidimensional
  6. 11.

    Time Series Data Mining Challenges Time Series Data Mining Activities

    Supervised classification of time series C 1 C 2 C 3 C 2 C 3 C 1 ALGORITHM CLASSIFIER ? C 2 TRAINING SET
  7. 14.

    Time Series Data Mining Challenges Clustering Outline of the presentation

    1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  8. 16.

    Time Series Data Mining Challenges Clustering Time series clustering: hierarchical,

    partitional we need a DISTANCE 0 100 200 300 400 Series 1 Series 2 Series 3 Series 4 Series 5 Series 6 Series 7 Series 8 Series 9 k-means
  9. 18.

    Time Series Data Mining Challenges Clustering Euclidean Distance (ED) D(X,

    Y) = n i=1 (xi − yi )2 Easy to compute Only for series with the same distance Does not consider the time Sensitivity to noise
  10. 19.

    Time Series Data Mining Challenges Clustering Dynamic Time Warping (DTW)

    Takes into account the ordered sequence (time) It can deal with series of different sizes Computationally expensive O(min{m, n}2)
  11. 21.

    Time Series Data Mining Challenges Clustering Alternatives to calculate distances

    Calculating distances Represent each series by means of a set of features and calculate the distance between the features Learn a parametric model for each series and calculate the distance between the parameters
  12. 22.

    Time Series Data Mining Challenges Clustering Distances between series Remarks

    There is not best distance (no free lunch) Each problem requires a different distance The distance to be used needs to be in agreement with out knowledge about what is far and what is close Hint: try with several distances Challenge: Design a method to the (semi)automatic selection of a distance
  13. 24.

    Time Series Data Mining Challenges Clustering Remarks on clustering Recent

    papers on the computation of a mean series Alternate clustering methods: graph-based, spectral... Challenge: Multivariate time series clustering almost unexplored
  14. 25.

    Time Series Data Mining Challenges (Early) Supervised Classification Outline of

    the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  15. 26.

    Time Series Data Mining Challenges (Early) Supervised Classification Supervised Classification

    of Time Series General-purpose classifiers Specific TS classifiers FEATURES C FEATURES CLASSIFIER SERIES C SERIES CLASSIFIER
  16. 27.

    Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers

    Each series is considered an instance Each time stamp is considered a feature t1 t2 t3 . . . tn C x11 x12 x13 . . . x1n c1 x21 x22 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm1 xm2 xm3 . . . xmn c2
  17. 28.

    Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers

    Each series is considered an instance Each time stamp is considered a feature t2 t1 t3 . . . tn C x12 x11 x13 . . . x1n c1 x22 x21 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm2 xm1 xm3 . . . xmn c2
  18. 29.

    Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers

    Each series is considered an instance Each time stamp is considered a feature t2 t1 t3 . . . tn C x12 x11 x13 . . . x1n c1 x22 x21 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm2 xm1 xm3 . . . xmn c2 CHALLENGE I When to use general-purpose and when time-series specific?
  19. 32.

    Time Series Data Mining Challenges (Early) Supervised Classification What is

    relevant in TSC? PROBLEM I PROBLEM II SHAPE LOCATION
  20. 33.

    Time Series Data Mining Challenges (Early) Supervised Classification A taxonomy

    of time series classification methods Taxonomy Distance-based classifiers Model-based classfiers Feature-based classifiers Shapelets-based classifiers
  21. 34.

    Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time

    series classification General Schema Define a distance between time series Use classifiers based on distances: 1-NN ...
  22. 35.

    Time Series Data Mining Challenges (Early) Supervised Classification 1-Nearest Neighbour

    (1-NN) Easy to understand Better results with higher number of series Computational cost Challenge: What distance??? C 1 C 2 C 3 C 2 C 3 C 1 ? C 2 (d 1 d 2 d 3 d 4 d 5 d 6 ) MINIMUM DISTANCE
  23. 36.

    Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time

    series classification. General Approach CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ...
  24. 37.

    Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time

    series classification. General Approach CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ... VECTORS!
  25. 38.

    Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time

    series classification. General Approach Any algorithm based on distance could be applied It depends on the number of series in training Computationally expensive CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ... VECTORS!
  26. 39.

    Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time

    series classification CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...
  27. 40.

    Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time

    series classification Features Statistics: mean, variance Autorregresive coefficients Fourier coefficients Shift, trend, ... CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...
  28. 41.

    Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time

    series classification Representation independent on the number of series Interpretable representation Challgenge: what feature to use? CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...
  29. 42.

    Time Series Data Mining Challenges (Early) Supervised Classification Shapelets-based classification

    Lij could be distance or presence Computationally expensive When the shapelets are relevant extremely good results Easy to interpret Shapelet 2
  30. 43.

    Time Series Data Mining Challenges (Early) Supervised Classification Model-based time

    series classification SERIES C SERIES PREDICTION MODEL I PREDICTION MODEL II PREDICTION MODEL III What is the most probable model?
  31. 44.

    Time Series Data Mining Challenges (Early) Supervised Classification Model-based time

    series classification Good results with an appropriate model Choice of model Existence of model SERIES C SERIES PREDICTION MODEL I PREDICTION MODEL II PREDICTION MODEL III What is the most probable model?
  32. 45.

    Time Series Data Mining Challenges (Early) Supervised Classification Early time

    series classification Examples Early activity recognition Early disease recognition in electrocardiograms Early detection of sepsis in newborn Early detection of failures in machines (predictive maintenance)
  33. 46.

    Time Series Data Mining Challenges (Early) Supervised Classification Early time

    series classification Balance between accuracy and earlyness C 1 C 2 C 3 C 2 C 3 C 1 ALGORITHM C 2 TRAINING SET ? ? Wait for more data CLASSIFIER EARLY
  34. 47.

    Time Series Data Mining Challenges (Early) Supervised Classification Early time

    series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3
  35. 48.

    Time Series Data Mining Challenges (Early) Supervised Classification Early time

    series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3
  36. 49.

    Time Series Data Mining Challenges (Early) Supervised Classification Early time

    series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3 Output BLUE class
  37. 51.

    Time Series Data Mining Challenges Outlier/Anomaly Detection Outline of the

    presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  38. 61.

    Time Series Data Mining Challenges Conclusions and Future Work Outline

    of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work
  39. 62.

    Time Series Data Mining Challenges Conclusions and Future Work Almost

    unexplored lands Challenges Time series subset selection Learning in weakly environments: semi-supervised, multi-label, crowd learning Theoretical bounds on learning: assumptions on the generating model
  40. 63.

    Time Series Data Mining Challenges Conclusions and Future Work Collaboration

    Usue Mori (UPV/EHU), Amaia Abanda (BCAM) Ane Blazque (Ikerlan), Angel Conde (Ikerlan) Aritz Perez (BCAM), Izaskun Oregui (Tecnalia), Javier del Ser (Tecnalia) Josu Ircio (Ikerlan), Aizea Lojo (Ikerlan)
  41. 64.

    Time Series Data Mining Challenges Conclusions and Future Work Time

    Series Data Mining Challenges Jose A. Lozano Basque Center for Applied Mathematics (BCAM) University of the Basque Country UPV/EHU MACsPro, Vienna, March 21-23, 2019