Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Time Series Data Mining Challenges

Time Series Data Mining Challenges

MACSPro'2019 - Modeling and Analysis of Complex Systems and Processes, Vienna
21 - 23 March 2019

Prof. Jose A. Lozano

Conference website http://macspro.club/

Website https://exactpro.com/
Linkedin https://www.linkedin.com/company/exactpro-systems-llc
Instagram https://www.instagram.com/exactpro/
Twitter https://twitter.com/exactpro
Facebook https://www.facebook.com/exactpro/
Youtube Channel https://www.youtube.com/c/exactprosystems

Exactpro

March 22, 2019
Tweet

More Decks by Exactpro

Other Decks in Research

Transcript

  1. Time Series Data Mining Challenges
    Time Series Data Mining Challenges
    Jose A. Lozano
    Basque Center for Applied Mathematics (BCAM)
    University of the Basque Country UPV/EHU
    MACsPro, Vienna, March 21-23, 2019

    View full-size slide

  2. Time Series Data Mining Challenges
    Basque Country

    View full-size slide

  3. Time Series Data Mining Challenges
    Donostia-San Sebastián

    View full-size slide

  4. Time Series Data Mining Challenges
    Bilbao

    View full-size slide

  5. Time Series Data Mining Challenges
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  6. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  7. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Time series all around
    Temporal
    correlation
    High dimen-
    sionality
    Noisy
    Industry 4.0
    Bio Signals
    Weather Forecasting
    Shapes

    View full-size slide

  8. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Time series forecasting

    View full-size slide

  9. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Time series data Base: our object of study
    A set of time series
    (usually big)
    Different lengths
    Multidimensional

    View full-size slide

  10. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Clustering
    Clustering
    Algorithm

    View full-size slide

  11. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Supervised classification of time series
    C
    1
    C
    2
    C
    3
    C
    2
    C
    3
    C
    1
    ALGORITHM
    CLASSIFIER
    ? C
    2
    TRAINING SET

    View full-size slide

  12. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Anomaly/outlier detection

    View full-size slide

  13. Time Series Data Mining Challenges
    Time Series Data Mining Activities
    Segmentation

    View full-size slide

  14. Time Series Data Mining Challenges
    Clustering
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  15. Time Series Data Mining Challenges
    Clustering
    Time series clustering. Examples
    CLUSTERING
    ALGORITHM

    View full-size slide

  16. Time Series Data Mining Challenges
    Clustering
    Time series clustering: hierarchical, partitional
    we need a
    DISTANCE
    0 100 200 300 400
    Series 1
    Series 2
    Series 3
    Series 4
    Series 5
    Series 6
    Series 7
    Series 8
    Series 9
    k-means

    View full-size slide

  17. Time Series Data Mining Challenges
    Clustering
    Distance between time series
    Rigid Distance Flexible Distance

    View full-size slide

  18. Time Series Data Mining Challenges
    Clustering
    Euclidean Distance (ED)
    D(X, Y) = n
    i=1
    (xi
    − yi
    )2
    Easy to compute
    Only for series with the same distance
    Does not consider the time
    Sensitivity to noise

    View full-size slide

  19. Time Series Data Mining Challenges
    Clustering
    Dynamic Time Warping (DTW)
    Takes into account the ordered
    sequence (time)
    It can deal with series of different
    sizes
    Computationally expensive
    O(min{m, n}2)

    View full-size slide

  20. Time Series Data Mining Challenges
    Clustering
    Euclidean Distance vs Dynamic Time Warping
    EUCLIDEAN DTW
    6
    6

    View full-size slide

  21. Time Series Data Mining Challenges
    Clustering
    Alternatives to calculate distances
    Calculating distances
    Represent each series by means of a set of features and
    calculate the distance between the features
    Learn a parametric model for each series and calculate the
    distance between the parameters

    View full-size slide

  22. Time Series Data Mining Challenges
    Clustering
    Distances between series
    Remarks
    There is not best distance (no free lunch)
    Each problem requires a different distance
    The distance to be used needs to be in agreement with out
    knowledge about what is far and what is close
    Hint: try with several distances
    Challenge:
    Design a method to the (semi)automatic selection of a distance

    View full-size slide

  23. Time Series Data Mining Challenges
    Clustering
    ...Come back to clustering: K-means
    k-medois
    k-means
    k-medoids

    View full-size slide

  24. Time Series Data Mining Challenges
    Clustering
    Remarks on clustering
    Recent papers on the computation of a mean series
    Alternate clustering methods: graph-based, spectral...
    Challenge: Multivariate time series clustering almost
    unexplored

    View full-size slide

  25. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  26. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Supervised Classification of Time Series
    General-purpose classifiers Specific TS classifiers
    FEATURES C FEATURES
    CLASSIFIER
    SERIES C SERIES
    CLASSIFIER

    View full-size slide

  27. Time Series Data Mining Challenges
    (Early) Supervised Classification
    General-purpose classifiers
    Each series is considered an instance
    Each time stamp is considered a feature
    t1
    t2
    t3 . . . tn C
    x11
    x12
    x13 . . . x1n
    c1
    x21
    x22
    x23 . . . x2n
    c2
    . . . . . . . . . . . . . . . . . .
    xm1
    xm2
    xm3 . . . xmn c2

    View full-size slide

  28. Time Series Data Mining Challenges
    (Early) Supervised Classification
    General-purpose classifiers
    Each series is considered an instance
    Each time stamp is considered a feature
    t2
    t1
    t3 . . . tn C
    x12
    x11
    x13 . . . x1n
    c1
    x22
    x21
    x23 . . . x2n
    c2
    . . . . . . . . . . . . . . . . . .
    xm2
    xm1
    xm3 . . . xmn c2

    View full-size slide

  29. Time Series Data Mining Challenges
    (Early) Supervised Classification
    General-purpose classifiers
    Each series is considered an instance
    Each time stamp is considered a feature
    t2
    t1
    t3 . . . tn C
    x12
    x11
    x13 . . . x1n
    c1
    x22
    x21
    x23 . . . x2n
    c2
    . . . . . . . . . . . . . . . . . .
    xm2
    xm1
    xm3 . . . xmn c2
    CHALLENGE I
    When to use general-purpose and when time-series specific?

    View full-size slide

  30. Time Series Data Mining Challenges
    (Early) Supervised Classification
    What is relevant in TSC?
    PROBLEM I PROBLEM II

    View full-size slide

  31. Time Series Data Mining Challenges
    (Early) Supervised Classification
    What is relevant in TSC?
    PROBLEM I PROBLEM II
    SHAPE

    View full-size slide

  32. Time Series Data Mining Challenges
    (Early) Supervised Classification
    What is relevant in TSC?
    PROBLEM I PROBLEM II
    SHAPE LOCATION

    View full-size slide

  33. Time Series Data Mining Challenges
    (Early) Supervised Classification
    A taxonomy of time series classification methods
    Taxonomy
    Distance-based classifiers
    Model-based classfiers
    Feature-based classifiers
    Shapelets-based classifiers

    View full-size slide

  34. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Distance-based time series classification
    General Schema
    Define a distance between time series
    Use classifiers based on distances:
    1-NN
    ...

    View full-size slide

  35. Time Series Data Mining Challenges
    (Early) Supervised Classification
    1-Nearest Neighbour (1-NN)
    Easy to understand
    Better results with higher
    number of series
    Computational cost
    Challenge: What
    distance???
    C
    1
    C
    2
    C
    3
    C
    2
    C
    3
    C
    1
    ? C
    2
    (d
    1
    d
    2
    d
    3
    d
    4
    d
    5
    d
    6
    )
    MINIMUM
    DISTANCE

    View full-size slide

  36. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Distance-based time series classification. General
    Approach
    CLASIFICADOR
    SERIES C DISTANCE MATRIX
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    DISTANCES
    ...

    View full-size slide

  37. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Distance-based time series classification. General
    Approach
    CLASIFICADOR
    SERIES C DISTANCE MATRIX
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    DISTANCES
    ...
    VECTORS!

    View full-size slide

  38. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Distance-based time series classification. General
    Approach
    Any algorithm based on
    distance could be
    applied
    It depends on the
    number of series in
    training
    Computationally
    expensive
    CLASIFICADOR
    SERIES C DISTANCE MATRIX
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    DISTANCES
    ...
    VECTORS!

    View full-size slide

  39. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Feature-based time series classification
    CLASIFICADOR
    SERIES C FEATURES
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    FEATURES
    ...

    View full-size slide

  40. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Feature-based time series classification
    Features
    Statistics: mean,
    variance
    Autorregresive
    coefficients
    Fourier coefficients
    Shift, trend, ...
    CLASIFICADOR
    SERIES C FEATURES
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    FEATURES
    ...

    View full-size slide

  41. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Feature-based time series classification
    Representation
    independent on the
    number of series
    Interpretable
    representation
    Challgenge: what
    feature to use?
    CLASIFICADOR
    SERIES C FEATURES
    ...
    ...
    ...
    ...
    ...
    ...
    C
    SERIES
    CLASSIFIER
    FEATURES
    ...

    View full-size slide

  42. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Shapelets-based classification
    Lij
    could be distance or
    presence
    Computationally expensive
    When the shapelets are
    relevant extremely good
    results
    Easy to interpret
    Shapelet 2

    View full-size slide

  43. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Model-based time series classification
    SERIES C
    SERIES
    PREDICTION
    MODEL I
    PREDICTION
    MODEL II
    PREDICTION
    MODEL III
    What is the most
    probable model?

    View full-size slide

  44. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Model-based time series classification
    Good results with an
    appropriate model
    Choice of model
    Existence of model
    SERIES C
    SERIES
    PREDICTION
    MODEL I
    PREDICTION
    MODEL II
    PREDICTION
    MODEL III
    What is the most
    probable model?

    View full-size slide

  45. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Early time series classification
    Examples
    Early activity recognition
    Early disease recognition in electrocardiograms
    Early detection of sepsis in newborn
    Early detection of failures in machines (predictive
    maintenance)

    View full-size slide

  46. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Early time series classification
    Balance between
    accuracy and
    earlyness
    C
    1
    C
    2
    C
    3
    C
    2
    C
    3
    C
    1
    ALGORITHM
    C
    2
    TRAINING SET
    ?
    ?
    Wait for
    more data
    CLASSIFIER
    EARLY

    View full-size slide

  47. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Early time series classification
    t1 2
    t
    C1 C2 C3
    t3 ...
    ...
    ...
    T1 T2 T3

    View full-size slide

  48. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Early time series classification
    t1 2
    t
    C1 C2 C3
    t3 ...
    ...
    ...
    T1 T2 T3

    View full-size slide

  49. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Early time series classification
    t1 2
    t
    C1 C2 C3
    t3
    ...
    ...
    ...
    T1 T2 T3
    Output BLUE class

    View full-size slide

  50. Time Series Data Mining Challenges
    (Early) Supervised Classification
    Multivariate time series classification
    CHALLENGE

    View full-size slide

  51. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  52. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outlier vs Anomaly

    View full-size slide

  53. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Type of outlier: point outlier

    View full-size slide

  54. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Type of outlier: subsequence outlier

    View full-size slide

  55. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Type of outlier: series outlier

    View full-size slide

  56. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outlier detection method: basic
    |xt − ˆ
    xt| < τ

    View full-size slide

  57. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outlier detection method: basic
    |xt − ˆ
    xt| < τ Median

    View full-size slide

  58. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outlier detection method: basic
    |xt − ˆ
    xt| < τ MAD

    View full-size slide

  59. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    Outlier detection method: basic
    |xt − ˆ
    xt| < τ Model

    View full-size slide

  60. Time Series Data Mining Challenges
    Outlier/Anomaly Detection
    An overview of outlier/anomaly detection

    View full-size slide

  61. Time Series Data Mining Challenges
    Conclusions and Future Work
    Outline of the presentation
    1 Time Series Data Mining Activities
    2 Clustering
    3 (Early) Supervised Classification
    4 Outlier/Anomaly Detection
    5 Conclusions and Future Work

    View full-size slide

  62. Time Series Data Mining Challenges
    Conclusions and Future Work
    Almost unexplored lands
    Challenges
    Time series subset selection
    Learning in weakly environments: semi-supervised,
    multi-label, crowd learning
    Theoretical bounds on learning: assumptions on the
    generating model

    View full-size slide

  63. Time Series Data Mining Challenges
    Conclusions and Future Work
    Collaboration
    Usue Mori (UPV/EHU), Amaia Abanda (BCAM)
    Ane Blazque (Ikerlan), Angel Conde (Ikerlan)
    Aritz Perez (BCAM), Izaskun Oregui (Tecnalia), Javier del
    Ser (Tecnalia)
    Josu Ircio (Ikerlan), Aizea Lojo (Ikerlan)

    View full-size slide

  64. Time Series Data Mining Challenges
    Conclusions and Future Work
    Time Series Data Mining Challenges
    Jose A. Lozano
    Basque Center for Applied Mathematics (BCAM)
    University of the Basque Country UPV/EHU
    MACsPro, Vienna, March 21-23, 2019

    View full-size slide