Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Change Detection in Streaming Data

Dang-Hoan Tran
September 10, 2015

Change Detection in Streaming Data

Dissertation defense

Dang-Hoan Tran

September 10, 2015
Tweet

Other Decks in Education

Transcript

  1. Change Detection in Streaming Data Dang-Hoan Tran Thesis Defense 1

    10/8/2013 Change Detection in Streaming Data
  2. Outline 1. Motivation 2. Change Detection in A Single Data

    Stream 1. Window-based Change Detection 2. Synopsis-based Change Detection 3. Clustering-based Change Detection 3. Decentralized Change Detection 4. Conclusion 2 10/8/2013 Change Detection in Streaming Data
  3. • Detection accuracy – Maximize probability of detection – Minimize

    probability of false alarms • Promptness – Minimize detection delay • Online processing  Limitation: memory, computation, energy  Limit the number of scans on data  Construct synopses Requirements of Change Detection 4 10/8/2013 Motivation and Concepts
  4. • Diversity of change detection – Signal detection in noise

    – Change point detection, and hypothesis testing • Dynamic and infinite nature of streaming data – Limitation: computation, space, communication – Limit the number of scans on data • Distributed nature of data – Fault-tolerance – Data synchronization – Communication Challenges facing Change Detection 5 10/8/2013 Motivation and Concepts
  5. Change Detection in Streaming Data 6 • Change detection identifying

    differences in the state of an object or phenomenon by observing it at different times and/or different locations in space • Change detection in streaming data Result of a continuous change detector in streaming data considered a sequence of results of one-time change detector 10/8/2013 Motivation and Concepts
  6. Change Detection in A Single Data Stream 7 Published Results

    MoMM and MDM 2011 10/8/2013 Change Detection in A Single Data Stream
  7. Change Detection in Single Stream 8 • Change in data

    leading to change in model  Update model based on change  Rebuild model when a change occurs • Detect change in data based on model change 10/8/2013 Change Detection in A Single Data Stream
  8. Local Change Detection Change Detection Levels 9 Change Detection in

    Raw Data Synopsis-based Change Detection Model-based Change Detection Detect change in multivariate data Window too large Detect change in data feature 10/8/2013 Change Detection in A Single Data Stream
  9. Window-based Change Detection 10 • Window  Infinite nature of

    data stream • Sliding window  Value of data decreases over time  Only interested in recent data • Two windows quantifying change  Adapt to the environment  Absolute threshold not adaptive • Nonparametric change detection  Distribution before and after change unknown 10/8/2013 Change Detection in A Single Data Stream
  10. Tuple-based Sliding Window 11 • Sliding window • Overlapping windows

    model • Adjacent windows model Reference window Current window Current window Reference window 9 8 8 5 8 7 4 5 3 5 8 7 4 5 3 6 8 7 b new incoming tuples b expired tuples Window of size N tuples (N-b) old tuples 10/8/2013 Change Detection in A Single Data Stream
  11. Distance-based change detection • Distance function • Detection threshold •

    reference window and current window Change Criteria 12     0 1 2 1 1 2 , , H d w w thresh H d w w thresh      1 w 2 w 10/8/2013 Change Detection in A Single Data Stream   1 2 , d w w thresh
  12. Distance Function 13 • Distance function between two windows 

    Statistical distance: statistically quantifying change, high computation cost  Geometric distance: lower computation cost • Selection of distance function  Detection accuracy  Computational complexity 10/8/2013 Change Detection in A Single Data Stream 1 2 1 2 2 1 1 2 1 2 d(w ,w ) 0 d(w ,w ) d(w ,w ) d(w ,w ) 0 w w     
  13. Detection Threshold 14 • Distinguishes between ‘Changed’ or ‘Unchanged’ of

    an event  User-defined threshold  automatically generated threshold • Threshold selection to minimize  False alarm rate  Mis-detection rate • Threshold selection challenging  Application-dependent 10/8/2013 Change Detection in A Single Data Stream
  14. Synopsis-based Change Detector 15 • Data summarization using synopsis 

    Window too large to fit in memory  Synopsis acceptable for detecting event • Change detection in data features  different views on data • Tradeoff  detection accuracy  space-efficiency  computation efficiency 10/8/2013 Synopsis-based Change Detector
  15. 1. Init: read the first N items into the reference

    window [construct synopsis for reference window] read N items into the current window [construct synopsis for current window] 2. Continuous monitoring: while not at the end of stream do compute distance between two windows [synopses]; If distance greater than threshold then report change occurred at time t ; clear all windows and goto step 1; else slide the current window; [construct synopsis for current window] endif; endwhile; Change Detection Algorithm 16 10/8/2013 Change Detection in A Single Data Stream
  16. Incremental Computation of DFT 17 Input Output For each k

    from 0 to K-1 do 10/8/2013 Change Detection in A Single Data Stream new k X 2 0 j k old N N k x x e X N           Window before sliding Window after sliding N x 0 x 1 x 1 x 1 N x  1 N x  0 , , old k N X x x new k X
  17. Detection Accuracy 18 • Probability of hits • Probability of

    false alarms • Precision • Recall False alarms False alarm rate False alarms correct rejection   Recall Hits Hits Misses   Precision Hits Hits False Alarms   10/8/2013 Change Detection in A Single Data Stream Rate Hits Hit Hits Misses  
  18. Computation of Detection Accuracy 19 Detected change Detected unchange Total

    Actual change Hits 132 Misses 18 150 Actual unchange False alarms 194 Correct Rejections 51629 51823 326 51647 51973 132 0.88 132 18 194 0.0037 194 51629 HitRate FA       132 Precision 0.4049 132 194 132 Recall 0.88 132 18       10/8/2013 Change Detection in A Single Data Stream
  19. ROC and PR Graphs 20 • Receiver Operating Characteristics graph

    two-dimensional graph in which hit rate is on Y-axis and FA rate is on X-axis. The goal of ROC graph is left-upper corner • Precision Recall graph two-dimensional graph in which Precision is on Y-axis and Recall is on X-axis. The goal of PR graph is the right-upper corner 10/8/2013 Change Detection in A Single Data Stream
  20. Detectibility of Distance Functions 21 • Plotting ROC graph 

    Overlapping windows model  Window size fixed to 128  For each distance-based detector Set absolute threshold to 30°C Vary distance-based thresholds Compute a pair of (Hit-rate,FA-rate) corresponding to each distance-based threshold Plott (Hit-rate,FA-rate) in ROC space • Plotting PR graph 10/8/2013 Change Detection in A Single Data Stream
  21. ROC space Detectibility of Distance Functions 22 • Euclidean detector

    better than Manhattan one in ROC • Euclidean and Manhttan detectors similar in PR 10/8/2013 Change Detection in A Single Data Stream
  22. Synopsis-based Detector Preserving Detection Accuracy 24 • Detection accuracy of

    synopsis-based detector is preserved if the distance is preserved under synopsis construction • Transformation preserving the Euclidean distance  Discrete Fourier Transformation  Haar Wavelet Transformation 10/8/2013 Synopsis-based Change Detector
  23. Automated Change Detection and Reactive Clustering 25 • Event detection

    based on multivariate data  Fire is detected based on increase in temperature and light intensity, and decrease in humidity • Automated change detection  Threshold adaptive to the environment  Automatically-generated threshold • Building and maintenace of clustering  Nearby sensors high correlation in readings  Changing stream evolving clustering 10/8/2013 Synopsis-based Change Detector
  24. Change Criteria 26 • change if new points not fit

    current clustering • a change point •a block of points       1 , K i i i change d x center C radius C             1 2 1 , ,.., b b i i change x x x change x    10/8/2013 Change Detection and Reactive Clustering
  25. 1. Init: read the first N items into the reference

    window; create the clustering in the reference window; assign the content of the reference window sliding by one step to the current window; get new item from the current window; 2. Continuous monitoring: while not at the end of stream do If change(new item) then report change occurred at time t; assign the current window to the reference window; rebuild the new clustering in the reference window; endif; slide the current window ; get new item from the current window; endwhile; Change Detection and Reactive Clustering 27 10/8/2013 Change Detection and Reactive Clustering
  26. Distributed Change Detection 30 • Three dimensions of solution •

    Requirements  Reduction in communication overhead  Assurance of detecion accuracy • Trade-off  Detection accuracy, detection delay, computation  Complexity, space, communication cost Decision Fusion Strategies Network Architecture Local detector 10/8/2013 Distributed Change Detection
  27. Local Change Detection Distributed Change Detection 32 Change Detection in

    Raw Data Synopsis-based Change Detection Model-based Change Detection Decision Fusion And Global Change Detection  Data comes from multiple resources 10/8/2013 Distributed Change Detection
  28. • Fusion with detection accuracy • Fusion without detection accuracy

    • Change criteria where is global detection threshold Decision Fusion Rules 33   1 1 ln 1 ln 1 i i i i N d d i i i fa fa P P u u P P                 0 1 H T H T        1 M i i u     0 1 i no change u change     T 0 1 i i d i fa P hit rate no change u change P false alarm rate          10/8/2013 Distributed Change Detection
  29. Contributions • Developed framework for detecting changes in distributed streaming

    data – Local detector – Decision fusion • Developed change detection algorithms for streaming data using two windows model – Change detector for raw streaming data – Synopsis-based change detector – Automated clustering-based detector for multivariate data • Developed DFT-based detector – Incremental computation of DFT coefficients – Detection accuracy of synopsis-based detector is preserved if the distance is preserved under synopsis construction process 34 10/8/2013
  30. Future Work • Developing change detection methods for other synopses

    such Frequent Patterns, sampling, histogram, wavelet, etc • Developing change detection for sparse data stream or detection of rare change • Developing a distributed change detection changes without using detection accuracy 35 10/8/2013 Change Detection in Streaming Data