Slide 1

Slide 1 text

Time Series Data Mining Challenges Time Series Data Mining Challenges Jose A. Lozano Basque Center for Applied Mathematics (BCAM) University of the Basque Country UPV/EHU MACsPro, Vienna, March 21-23, 2019

Slide 2

Slide 2 text

Time Series Data Mining Challenges Basque Country

Slide 3

Slide 3 text

Time Series Data Mining Challenges Donostia-San Sebastián

Slide 4

Slide 4 text

Time Series Data Mining Challenges Bilbao

Slide 5

Slide 5 text

Time Series Data Mining Challenges Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 6

Slide 6 text

Time Series Data Mining Challenges Time Series Data Mining Activities Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 7

Slide 7 text

Time Series Data Mining Challenges Time Series Data Mining Activities Time series all around Temporal correlation High dimen- sionality Noisy Industry 4.0 Bio Signals Weather Forecasting Shapes

Slide 8

Slide 8 text

Time Series Data Mining Challenges Time Series Data Mining Activities Time series forecasting

Slide 9

Slide 9 text

Time Series Data Mining Challenges Time Series Data Mining Activities Time series data Base: our object of study A set of time series (usually big) Different lengths Multidimensional

Slide 10

Slide 10 text

Time Series Data Mining Challenges Time Series Data Mining Activities Clustering Clustering Algorithm

Slide 11

Slide 11 text

Time Series Data Mining Challenges Time Series Data Mining Activities Supervised classification of time series C 1 C 2 C 3 C 2 C 3 C 1 ALGORITHM CLASSIFIER ? C 2 TRAINING SET

Slide 12

Slide 12 text

Time Series Data Mining Challenges Time Series Data Mining Activities Anomaly/outlier detection

Slide 13

Slide 13 text

Time Series Data Mining Challenges Time Series Data Mining Activities Segmentation

Slide 14

Slide 14 text

Time Series Data Mining Challenges Clustering Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 15

Slide 15 text

Time Series Data Mining Challenges Clustering Time series clustering. Examples CLUSTERING ALGORITHM

Slide 16

Slide 16 text

Time Series Data Mining Challenges Clustering Time series clustering: hierarchical, partitional we need a DISTANCE 0 100 200 300 400 Series 1 Series 2 Series 3 Series 4 Series 5 Series 6 Series 7 Series 8 Series 9 k-means

Slide 17

Slide 17 text

Time Series Data Mining Challenges Clustering Distance between time series Rigid Distance Flexible Distance

Slide 18

Slide 18 text

Time Series Data Mining Challenges Clustering Euclidean Distance (ED) D(X, Y) = n i=1 (xi − yi )2 Easy to compute Only for series with the same distance Does not consider the time Sensitivity to noise

Slide 19

Slide 19 text

Time Series Data Mining Challenges Clustering Dynamic Time Warping (DTW) Takes into account the ordered sequence (time) It can deal with series of different sizes Computationally expensive O(min{m, n}2)

Slide 20

Slide 20 text

Time Series Data Mining Challenges Clustering Euclidean Distance vs Dynamic Time Warping EUCLIDEAN DTW 6 6

Slide 21

Slide 21 text

Time Series Data Mining Challenges Clustering Alternatives to calculate distances Calculating distances Represent each series by means of a set of features and calculate the distance between the features Learn a parametric model for each series and calculate the distance between the parameters

Slide 22

Slide 22 text

Time Series Data Mining Challenges Clustering Distances between series Remarks There is not best distance (no free lunch) Each problem requires a different distance The distance to be used needs to be in agreement with out knowledge about what is far and what is close Hint: try with several distances Challenge: Design a method to the (semi)automatic selection of a distance

Slide 23

Slide 23 text

Time Series Data Mining Challenges Clustering ...Come back to clustering: K-means k-medois k-means k-medoids

Slide 24

Slide 24 text

Time Series Data Mining Challenges Clustering Remarks on clustering Recent papers on the computation of a mean series Alternate clustering methods: graph-based, spectral... Challenge: Multivariate time series clustering almost unexplored

Slide 25

Slide 25 text

Time Series Data Mining Challenges (Early) Supervised Classification Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 26

Slide 26 text

Time Series Data Mining Challenges (Early) Supervised Classification Supervised Classification of Time Series General-purpose classifiers Specific TS classifiers FEATURES C FEATURES CLASSIFIER SERIES C SERIES CLASSIFIER

Slide 27

Slide 27 text

Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers Each series is considered an instance Each time stamp is considered a feature t1 t2 t3 . . . tn C x11 x12 x13 . . . x1n c1 x21 x22 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm1 xm2 xm3 . . . xmn c2

Slide 28

Slide 28 text

Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers Each series is considered an instance Each time stamp is considered a feature t2 t1 t3 . . . tn C x12 x11 x13 . . . x1n c1 x22 x21 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm2 xm1 xm3 . . . xmn c2

Slide 29

Slide 29 text

Time Series Data Mining Challenges (Early) Supervised Classification General-purpose classifiers Each series is considered an instance Each time stamp is considered a feature t2 t1 t3 . . . tn C x12 x11 x13 . . . x1n c1 x22 x21 x23 . . . x2n c2 . . . . . . . . . . . . . . . . . . xm2 xm1 xm3 . . . xmn c2 CHALLENGE I When to use general-purpose and when time-series specific?

Slide 30

Slide 30 text

Time Series Data Mining Challenges (Early) Supervised Classification What is relevant in TSC? PROBLEM I PROBLEM II

Slide 31

Slide 31 text

Time Series Data Mining Challenges (Early) Supervised Classification What is relevant in TSC? PROBLEM I PROBLEM II SHAPE

Slide 32

Slide 32 text

Time Series Data Mining Challenges (Early) Supervised Classification What is relevant in TSC? PROBLEM I PROBLEM II SHAPE LOCATION

Slide 33

Slide 33 text

Time Series Data Mining Challenges (Early) Supervised Classification A taxonomy of time series classification methods Taxonomy Distance-based classifiers Model-based classfiers Feature-based classifiers Shapelets-based classifiers

Slide 34

Slide 34 text

Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time series classification General Schema Define a distance between time series Use classifiers based on distances: 1-NN ...

Slide 35

Slide 35 text

Time Series Data Mining Challenges (Early) Supervised Classification 1-Nearest Neighbour (1-NN) Easy to understand Better results with higher number of series Computational cost Challenge: What distance??? C 1 C 2 C 3 C 2 C 3 C 1 ? C 2 (d 1 d 2 d 3 d 4 d 5 d 6 ) MINIMUM DISTANCE

Slide 36

Slide 36 text

Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time series classification. General Approach CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ...

Slide 37

Slide 37 text

Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time series classification. General Approach CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ... VECTORS!

Slide 38

Slide 38 text

Time Series Data Mining Challenges (Early) Supervised Classification Distance-based time series classification. General Approach Any algorithm based on distance could be applied It depends on the number of series in training Computationally expensive CLASIFICADOR SERIES C DISTANCE MATRIX ... ... ... ... ... ... C SERIES CLASSIFIER DISTANCES ... VECTORS!

Slide 39

Slide 39 text

Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time series classification CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...

Slide 40

Slide 40 text

Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time series classification Features Statistics: mean, variance Autorregresive coefficients Fourier coefficients Shift, trend, ... CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...

Slide 41

Slide 41 text

Time Series Data Mining Challenges (Early) Supervised Classification Feature-based time series classification Representation independent on the number of series Interpretable representation Challgenge: what feature to use? CLASIFICADOR SERIES C FEATURES ... ... ... ... ... ... C SERIES CLASSIFIER FEATURES ...

Slide 42

Slide 42 text

Time Series Data Mining Challenges (Early) Supervised Classification Shapelets-based classification Lij could be distance or presence Computationally expensive When the shapelets are relevant extremely good results Easy to interpret Shapelet 2

Slide 43

Slide 43 text

Time Series Data Mining Challenges (Early) Supervised Classification Model-based time series classification SERIES C SERIES PREDICTION MODEL I PREDICTION MODEL II PREDICTION MODEL III What is the most probable model?

Slide 44

Slide 44 text

Time Series Data Mining Challenges (Early) Supervised Classification Model-based time series classification Good results with an appropriate model Choice of model Existence of model SERIES C SERIES PREDICTION MODEL I PREDICTION MODEL II PREDICTION MODEL III What is the most probable model?

Slide 45

Slide 45 text

Time Series Data Mining Challenges (Early) Supervised Classification Early time series classification Examples Early activity recognition Early disease recognition in electrocardiograms Early detection of sepsis in newborn Early detection of failures in machines (predictive maintenance)

Slide 46

Slide 46 text

Time Series Data Mining Challenges (Early) Supervised Classification Early time series classification Balance between accuracy and earlyness C 1 C 2 C 3 C 2 C 3 C 1 ALGORITHM C 2 TRAINING SET ? ? Wait for more data CLASSIFIER EARLY

Slide 47

Slide 47 text

Time Series Data Mining Challenges (Early) Supervised Classification Early time series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3

Slide 48

Slide 48 text

Time Series Data Mining Challenges (Early) Supervised Classification Early time series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3

Slide 49

Slide 49 text

Time Series Data Mining Challenges (Early) Supervised Classification Early time series classification t1 2 t C1 C2 C3 t3 ... ... ... T1 T2 T3 Output BLUE class

Slide 50

Slide 50 text

Time Series Data Mining Challenges (Early) Supervised Classification Multivariate time series classification CHALLENGE

Slide 51

Slide 51 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 52

Slide 52 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outlier vs Anomaly

Slide 53

Slide 53 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Type of outlier: point outlier

Slide 54

Slide 54 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Type of outlier: subsequence outlier

Slide 55

Slide 55 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Type of outlier: series outlier

Slide 56

Slide 56 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outlier detection method: basic |xt − ˆ xt| < τ

Slide 57

Slide 57 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outlier detection method: basic |xt − ˆ xt| < τ Median

Slide 58

Slide 58 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outlier detection method: basic |xt − ˆ xt| < τ MAD

Slide 59

Slide 59 text

Time Series Data Mining Challenges Outlier/Anomaly Detection Outlier detection method: basic |xt − ˆ xt| < τ Model

Slide 60

Slide 60 text

Time Series Data Mining Challenges Outlier/Anomaly Detection An overview of outlier/anomaly detection

Slide 61

Slide 61 text

Time Series Data Mining Challenges Conclusions and Future Work Outline of the presentation 1 Time Series Data Mining Activities 2 Clustering 3 (Early) Supervised Classification 4 Outlier/Anomaly Detection 5 Conclusions and Future Work

Slide 62

Slide 62 text

Time Series Data Mining Challenges Conclusions and Future Work Almost unexplored lands Challenges Time series subset selection Learning in weakly environments: semi-supervised, multi-label, crowd learning Theoretical bounds on learning: assumptions on the generating model

Slide 63

Slide 63 text

Time Series Data Mining Challenges Conclusions and Future Work Collaboration Usue Mori (UPV/EHU), Amaia Abanda (BCAM) Ane Blazque (Ikerlan), Angel Conde (Ikerlan) Aritz Perez (BCAM), Izaskun Oregui (Tecnalia), Javier del Ser (Tecnalia) Josu Ircio (Ikerlan), Aizea Lojo (Ikerlan)

Slide 64

Slide 64 text

Time Series Data Mining Challenges Conclusions and Future Work Time Series Data Mining Challenges Jose A. Lozano Basque Center for Applied Mathematics (BCAM) University of the Basque Country UPV/EHU MACsPro, Vienna, March 21-23, 2019