Upgrade to PRO for Only $50/Yearโ€”Limited-Time Offer! ๐Ÿ”ฅ

Monitoring Range Motif on Streaming Time-Series...

Avatar for Shinya Kato Shinya Kato
September 04, 2018

Monitoring Range Motif on Streaming Time-Series, presented at DEXAย 2018

Avatar for Shinya Kato

Shinya Kato

September 04, 2018
Tweet

More Decks by Shinya Kato

Other Decks in Research

Transcript

  1. Monitoring Range Motif on Streaming Time-Series Shinya Kato, Daichi Amagata,

    Shunya Nishio, and Takahiro Hara Osaka University, Japan
  2. Background (1/2) โšซRecently, many time-series data have been collected. 2

    Power consumption of home appliances Emissions of greenhouse gases Electrocardiogram Anomaly detection Environment monitoring Discovery of arrhythmia Analyze
  3. Background (2/2) โšซRange motif - One of the most important

    tools for analyzing time-series - A subsequence that appears in time-series repeatedly 3 Range motif Time-series
  4. Application example โšซEvent detection - Assume that you save the

    motif everyday 4 1 day ago 2 days ago 3 days ago It can be expected that there is an anomaly event. now Very different from past motifs
  5. Problem definition โšซRange motif monitoring on a streaming time-series under

    a count-based sliding window setting - When the window slides, a new value is inserted and the oldest value is removed. - Consider only the most recent ๐‘ค values 5 time most recent ๐‘ค values old values arenโ€™t considered.
  6. Preliminary โšซStreaming time-series ๐‘ก - ๐‘ก = ๐‘ก 1 ,

    ๐‘ก 2 , โ‹ฏ โšซSubsequence - ๐‘ ๐‘ = ๐‘ก ๐‘ , ๐‘ก ๐‘ + 1 , โ‹ฏ , ๐‘ก ๐‘ + ๐‘™ โˆ’ 1 โšซPearson correlation - ๐œŒ ๐‘ ๐‘ , ๐‘ ๐‘ž = ฯƒ๐‘–=0 ๐‘™โˆ’1 ๐‘ก ๐‘+๐‘– ๐‘ก[๐‘ž+๐‘–]โˆ’๐‘™๐œ‡๐‘๐œ‡๐‘ž ๐‘™๐œŽ๐‘๐œŽ๐‘ž - Relationship with z-normalized euclidean distance [1] ๐‘‘ ฦธ ๐‘ ๐‘ , ฦธ ๐‘ ๐‘ž = 2๐‘™ 1 โˆ’ ๐œŒ ๐‘ ๐‘ , ๐‘ ๐‘ž 6 [1] Mueen, A.: Time series join on subsequence correlation, ICDM (2014) ๐œ‡๐‘ : mean of ๐‘ ๐‘ ๐œŽ๐‘ : standard deviation of ๐‘ ๐‘ time ๐‘ก ๐‘ ๐‘™ ๐‘ ๐‘
  7. Preliminary โšซSimilar subsequence - ๐‘ ๐‘ (๐‘ ๐‘ž ) is similar to

    ๐‘ ๐‘ž (๐‘ ๐‘ ) if ๐œŒ ๐‘ ๐‘ , ๐‘ ๐‘ž โ‰ฅ ๐œƒ โŸบ ๐‘‘ ฦธ ๐‘ ๐‘ , ฦธ ๐‘ ๐‘ž โ‰ค 2๐‘™ 1 โˆ’ ๐œƒ โšซScore - ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’(๐‘ ๐‘ ) is the number of subsequences similar to ๐‘ ๐‘ . โšซRange motif ๐‘ โˆ— [2] - Subsequence with the highest score 7 [2] Patel, P.: Mining motifs in massive time series databases, ICDM (2003) ๐‘ ๐‘ ๐‘ ๐‘ž ๐‘ ๐‘Ÿ ๐œŒ ๐‘ ๐‘ , ๐‘ ๐‘ž โ‰ฅ ๐œƒ ๐œŒ ๐‘ ๐‘ , ๐‘ ๐‘Ÿ โ‰ฅ ๐œƒ ๐’”๐’„๐’๐’“๐’†(๐’”๐’‘ ) = ๐Ÿ
  8. Baseline algorithm โšซBy computing the similarity all subsequences and the

    expired new subsequence, Baseline algorithm updates the scores of all subsequences. 8 expired subsequence new subsequence Window new value observed Window old value deleted Compute Pearson correlation
  9. Problem & Research goal โšซProblem - Time complexity of Pearson

    correlation is ๐‘‚(๐‘™). - The number of computation is ๐‘ค โˆ’ ๐‘™ times. - Time complexity of Baseline algorithm is ๐‘ถ ๐’˜ โˆ’ ๐’ ๐’ . โšซResearch goal - When the window slides, speeding up the update time of the score and monitoring a motif efficiently. 9 We propose the algorithm โ€œSRMMโ€ (Streaming Range Motif Monitoring).
  10. SRMM (new subsequence ๐’”๐’) - Overview1 โšซSRMM maintains dimensional reduced

    subsequences by PAA [3] in window by a kd-tree. 10 [3] Keogh, E.: Dimensionality reduction for fast similarity search in large time series databases, KIS (2002) subsequences in window dimensional reduced subsequences Mapping trick Maintain by a kd-tree PAA ๐œ™-dimensional space ๐œ™ ๐‘™
  11. SRMM (new subsequence ๐’”๐’) - Overview2 โšซIf we can know

    "๐’”๐’„๐’๐’“๐’† ๐’”๐’ < ๐’”๐’„๐’๐’“๐’†(๐’”โˆ—)" quickly, we can efficiently monitor the exact motif. - We propose a technique that obtains ๐’”๐’„๐’๐’“๐’†๐’–๐’ƒ ๐’”๐’ (upper-bound of ๐’”๐’„๐’๐’“๐’†(๐’”๐’ )) efficiently. - It prunes the unnecessary exact score computation. โšซFlow of SRMM 11 PAA Insert into a kd-tree Range search Get ๐’”๐’„๐’๐’“๐’†๐’–๐’ƒ ๐’”๐’ ๐‘ ๐‘› ๐‘ ๐‘› ๐œ™ ๐‘™ ๐œ™
  12. SRMM (new subsequence ๐’”๐’) - PAA โšซPAA (Piecewise Aggregate Approximation)

    - A dimensionality reduction algorithm - Separate a time-series into segments, and get mean of values in segments. 12 Compute mean Compute mean Compute mean Compute mean Compute mean Compute mean Before transformed After transformed
  13. SRMM (new subsequence ๐’”๐’) - PAA โšซTo prune the exact

    distance computation, we use PAA. - Use the property that the distance between transformed subsequences become smaller 13 PAA โ‰ฅ โ‰ฅ 2๐œ™(1 โˆ’ ๐œƒ) ๐‘‚(๐‘™) ๐‘‚(๐œ™) We know ๐’”๐’‘ and ๐’”๐’’ are not similar in ๐‘ถ(๐“). ๐‘™ ๐‘ ๐‘ ๐‘ ๐‘ž ๐‘‘(๐‘ ๐‘ , ๐‘ ๐‘ž ) ๐œ™ ๐‘ ๐‘ ๐œ™ ๐‘ ๐‘ž ๐œ™ ๐‘‘(๐‘ ๐‘ ๐œ™, ๐‘ ๐‘ž ๐œ™)
  14. SRMM (new subsequence ๐’”๐’) - Mapping trick โšซSubsequences of length

    ๐œ™ can be regarded as a point on a ๐œ™-dimensional space. - Subsequences with large Pearson correlation are close on the ๐œ™-dimensional space. - There are all candidates for similar subsequences within the distance 2๐œ™(1 โˆ’ ๐œƒ) on the ๐œ™-dimensional space. 14 By range search, we can get ๐’”๐’„๐’๐’“๐’†๐’–๐’ƒ and candidates for similar subsequences ๐œ™-dimensional space ๐‘ ๐‘› ๐œ™ 2๐œ™(1 โˆ’ ๐œƒ)
  15. SRMM (new subsequence ๐’”๐’) - kd-tree โšซMaintain transformed subsequences by

    a kd-tree - Range search using a kd-tree is fast (log order). - The number of subsequences in the range is ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘ข๐‘ ๐‘ ๐‘› . 15 ๐œ™-dimensional space ๐‘ ๐‘› ๐œ™ Without using a kd-tree ๐‘‚ ๐œ™ ๐‘ค โˆ’ ๐‘™ Range search using a kd-tree ๐‘ถ ๐“ ๐ฅ๐จ๐  ๐’˜ โˆ’ ๐’ ๐œ™-dimensional space ๐‘ ๐‘› ๐œ™ 2๐œ™(1 โˆ’ ๐œƒ)
  16. SRMM (new subsequence ๐’”๐’) - Pruning โšซCompare ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘ข๐‘ ๐‘ ๐‘› with

    ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ โˆ— - If ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘ข๐‘ ๐‘ ๐‘› < ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ โˆ— then - If ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’๐‘ข๐‘ ๐‘ ๐‘› โ‰ฅ ๐‘ ๐‘๐‘œ๐‘Ÿ๐‘’ ๐‘ โˆ— then 16 Because ๐’”๐’ can be a motif, we must compute the exact ๐’”๐’„๐’๐’“๐’†(๐’”๐’ ). Because ๐’”๐’ cannot be a motif, we can safely prune computation of ๐’”๐’„๐’๐’“๐’†(๐’”๐’ ).
  17. SRMM (expired subsequence ๐’”๐’†) โšซWhen the window slides, score of

    subsequences which are similar to ๐‘ ๐‘’ are decreased. - Each subsequence has the similar subsequence list. - Identify the subsequences whose score decrease 17 ๐‘ ๐‘’ ๐œ™ ๐‘ ๐‘ ๐œ™ ๐‘ ๐‘ž ๐œ™ ๐‘ ๐‘Ÿ ๐œ™ ๐‘ ๐‘’ : ๐‘ ๐‘ : ๐‘ ๐‘ž : ๐‘ ๐‘Ÿ : Delete ๐‘ ๐‘ ๐‘ ๐‘ž ๐‘ ๐‘Ÿ ๐‘ ๐‘’ โ‹ฏ ๐‘ ๐‘’ โ‹ฏ ๐‘ ๐‘’ โ‹ฏ Delete Delete Identify the subsequences whose scores decrease
  18. Experiment โšซDataset - GreenHouseGas - RefrigerationDevices โšซParameters โšซComparative algorithm -

    Baseline algorithm โšซEvaluation criterion - Update time: average time to update a motif by window sliding 18 Window-size, ๐‘ค [ร— 103] 5, 10, 150, 200 Motif length, ๐‘™ 50, 100, 150, 200 Threshold, ๐œƒ 0.75, 0.8, 0.85, 0.9, 0.95
  19. Result โ€“ Impact of Window-size ๐‘ค 19 0 20 40

    60 80 5 10 15 20 Update time [msec] ๐‘ค [ร—103] Baseline SRMM 0 20 40 60 80 5 10 15 20 Update time [msec] ๐‘ค [ร—103] Baseline SRMM GreenHouseGas RefrigerationDevices SRMM is faster than Baseline.
  20. Result โ€“ Impact of Motif length ๐’ 20 0 20

    40 60 80 50 100 150 200 Update time [msec] ๐‘™ Baseline SRMM 0 20 40 60 80 50 100 150 200 Update time [msec] ๐‘™ Baseline SRMM GreenHouseGas RefrigerationDevices SRMM is not affected by ๐’.
  21. Result โ€“ Impact of Threshold ๐œฝ 21 0 20 40

    60 0.75 0.8 0.85 0.9 0.95 Update time [msec] ๐œƒ Baseline SRMM 0 20 40 60 0.75 0.8 0.85 0.9 0.95 Update time [msec] ๐œƒ Baseline SRMM GreenHouseGas RefrigerationDevices SRMM is faster as ๐œฝ increases.
  22. Conclusion โšซWe have proposed the efficient algorithm SRMM to monitor

    a range motif. - By using PAA and a kd-tree, unnecessary score computations are reduced. โšซThe results of our experiments show the efficiency and scalability. 22