Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Discord Monitoring for Streaming Time-series, presented at DEXA 2019

Discord Monitoring for Streaming Time-series, presented at DEXA 2019

Shinya Kato

August 27, 2019
Tweet

More Decks by Shinya Kato

Other Decks in Research

Transcript

  1. Background (1/2) ⚫Recently, many time-series have been collected. 1 Power

    consumption of home appliances Emissions of greenhouse gases Electrocardiogram Anomaly detection Environment monitoring Discovery of arrhythmia Analyze
  2. Background (2/2) ⚫Discord [Keogh ’06] - One of the most

    important tools for analyzing time-series - An outlier subsequence in time-series 2 [Keogh ’06] E. Keogh, “HOT SAX: Efficiently finding the most unusual time series subsequence,” ICDM-2006 ECG time-series data Discord ECG arrhythmias need to be discovered in real time.
  3. Problem definition ⚫Discord monitoring on a streaming time-series under a

    count-based sliding window setting - When the window slides, a new value is inserted and the oldest value is removed. - Consider only the most recent 𝑤 values 3 time most recent 𝑤 values old values aren’t considered.
  4. Preliminary (1/2) ⚫Discord - A subsequence that has the largest

    distance to its nearest neighbor in subsequences in the window 4 𝑡 𝑙 ⋱ 𝑤 − 𝑙 + 1 Compute nearest neighbor Discord ⁞ ⁞ 2.88 0.90 1.61 6.04 1.23 1.40
  5. Preliminary (2/2) ⚫In this research, we use 𝑧-normalized Euclidean distance

    between two subsequences. - A subsequence of length 𝑙 can be regarded as a point on a 𝑙-dimensional space. - Discord is the point that has the largest distance to its nearest neighbor point. 5 ⋱ 𝑤 − 𝑙 + 1 Represent as points on 𝑙-dimension Discord 𝑙-dimensional space
  6. ⚫When the window slides, recalculate the nearest neighbor of each

    subsequence. Naïve algorithm 6 It takes a lot of time, so cannot update a discord in real time. Deleted by window sliding Change NN Change NN Change NN Generated by window sliding Change NN Change NN Change NN
  7. Overview of proposed algorithm SDM ⚫We propose an algorithm SDM

    (Streaming Discord Monitoring) to monitor a discord. 1. Maintain the nearest neighbor of each subsequence in the window 2. Execute nearest neighbor search based on sequential scan 7 𝑠1 𝑠2 𝑠3 2.1 5.0 𝑁𝑁 = 2, 5.0 𝑁𝑁 = 2, 2.1 𝑁𝑁 = 3, 2.1 𝑁𝑁 = id, distance
  8. 𝒔⋅ NN 𝑠1 5, 1.0 𝑠2 7, 2.7 𝑠3 8,

    2.3 𝑠4 1, 1.1 𝑠5 1, 1.0 𝑠6 2, 3.0 𝑠7 3, 2.6 𝑠8 1, 1.4 𝑠9 𝑠2 𝑠1 Proposed algorithm SDM (Streaming Discord Monitoring) ⚫Maintain the nearest neighbor of each subsequence ⚫Nearest neighbor search based on sequential scan 8 While searching for NN of 𝒔𝟗 , we can update NN of the other subsequences. 𝑠3 𝑠4 𝑠5 𝑠6 𝑠7 𝑠8 𝒔⋅ NN 𝑠1 5, 1.0 𝑠2 7, 2.7 𝑠3 8, 2.3 𝑠4 1, 1.1 𝑠5 1, 1.0 𝑠6 2, 3.0 𝑠7 3, 2.6 𝑠8 1, 1.4 𝑠9 𝑑 𝑠2 , 𝑠9 = 1.5 9, 1.5 𝑑 𝑠3 , 𝑠9 = 3.5 𝑑 𝑠4 , 𝑠9 = 7.8 𝑑 𝑠5 , 𝑠9 = 8.8 𝑑 𝑠6 , 𝑠9 = 2.4 9, 2.4 𝑑 𝑠7 , 𝑠9 = 1.2 9, 1.2 𝑑 𝑠8 , 𝑠9 = 8.0 7, 1.2
  9. Problem of SDM ⚫When there are many subsequences whose nearest

    neighbors are unknown, the number of nearest neighbor searches is increased. 9 𝒔⋅ NN 𝑠1 5, 1.0 𝑠2 9, 1.5 𝑠3 8, 2.3 𝑠4 𝑠5 𝑠6 9, 2.4 𝑠7 9, 1.2 𝑠8 𝑠9 7, 1.2 We need to execute nearest neighbor search.
  10. ID NNolder NNyounger NN 1 − 5, 1.0 5, 1.0

    2 − 9, 1.5 9, 1.5 3 − 8, 2.3 8, 2.3 4 − 5, 1.5 − 5 − 8, 1.8 − 6 2, 3.0 9, 2.4 9, 2.4 7 3, 2.2 9, 1.2 9, 1.2 8 − 9, 4.2 − 9 7, 1.2 − 7, 1.2 ID NNolder NNyounger NN 1 − 5, 1.0 5, 1.0 2 − 7, 2.7 7, 2.7 3 − 8, 2.3 8, 2.3 4 1, 1.1 5, 1.5 1, 1.1 5 1, 1.0 8, 1.8 1, 1.0 6 2, 3.0 7, 3.1 2, 3.0 7 3, 2.6 8, 3.2 3, 2.6 8 1, 1.4 − 1, 1.4 9 Proposed algorithm SDM (Streaming Discord Monitoring) ⚫NNolder : NN to subsequences before it is generated ⚫NNyounger : NN to subsequences after it is generated 10 We can always maintain NNyounger accurately, and identify subsequences that can be discord. 𝑠2 𝑠1 𝑠3 𝑠4 𝑠5 𝑠6 𝑠7 𝑠8 𝑠9 not discord * * * *
  11. Experiment ⚫Dataset - Google-cpu: Time-series of Google data center CPU

    usage ⚫Parameters - Window-size 𝑤 [× 103] : 5, 10, 150, 200 - Discord length 𝑙 : 50, 100, 150, 200 ⚫Evaluation algorithms - HOT SAX : This is a discord detection algorithm for a static times- series, and we extend this to the algorithm that supports streaming time-series - N-SDM : This has NN, but does not have NNolder and NNyounger - SDM : Proposed algorithm - A-SDM : Approximation version of SDM 11
  12. Results – Impact of window-size 𝑤 12 SDM and A-SDM

    are faster than the other algorithms. 0 30 60 90 120 150 5 10 15 20 Average update time [msec] 𝑤 [×103] SDM N-SDM A-SDM HOTSAX 10 100 1000 10000 100000 5 10 15 20 Worst update time [msec] 𝑤 [×103] SDM N-SDM A-SDM HOTSAX
  13. Results – Impact of discord length 𝒍 13 0 30

    60 90 120 150 50 100 150 200 Average update time [msec] 𝑙 SDM N-SDM A-SDM HOTSAX 10 100 1000 10000 100000 50 100 150 200 Worst update time [msec] 𝑙 SDM N-SDM A-SDM HOTSAX SDM and A-SDM are faster than the other algorithms.
  14. Conclusion ⚫We have proposed an efficient algorithm SDM to monitor

    a discord. - Nearest neighbor search based on sequential scan - Maintain the nearest neighbor of each subsequence and the distance ⚫The results of our experiments show the efficiency and scalability of SDM. 14