Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anomaly Detection. Part 2 – Statistical Methods

Exactpro
May 30, 2022
23

Anomaly Detection. Part 2 – Statistical Methods

Rostislav Yavorski
Head of Research, Exactpro

“In Lecture 2, we are going to discuss the graphical methods: histogram, box plot, and scatter plot, as well as interquartile range, Tukey's fences, and null hypothesis, t-statistic, p-value.”

AI Testing Talks – Anomaly Detection. 30 May 2022

https://exactpro.com/events/external/ai-testing-talks-anomaly-detection?utm_source=speakerdeck&utm_medium=Refferer&utm_campaign=statistical-methods

---

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

Exactpro

May 30, 2022
Tweet

Transcript

  1. 1 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks BUILD

    SOFTWARE TO TEST SOFTWARE exactpro.com Lecture 2. Statistical Methods ANOMALY DETECTION FOR AI TESTING Rostislav Yavorski Head of Research, Exactpro 30 MAY | 10.00 GET | 11.30 SLST
  2. 2 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Terms

    An outlier is a data point that differs significantly from other observations Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour
  3. 3 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Plan

    1. Graphical Methods 2. Interquartile Range 3. Tukey's Fences 4. Seasonal and Trend Decomposition (STL) 5. Statistical Hypothesis Test 6. p-value and t-statistic 7. SciPy library
  4. 5 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks First,

    divide the entire range of values into a series of intervals, "bins" or "buckets", and then count how many values fall into each interval. Histogram
  5. 7 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks A

    scatter chart displays the relationship between 2 numeric variables. The position of each dot on the horizontal and vertical axes indicates values for a data point. Temperature °C Ice Cream Sales 14.2° $215 16.4° $325 11.9° $185 18.5° $406 22.1° $522 19.4° $412 25.1° $614 23.4° $544 15.2° $332 18.1° $421 22.6° $445 17.2° $408 https://www.mathsisfun.com/data/scatter-xy-plots.html Scatter Plot
  6. 10 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

    Q1 is the middle number between the minimum and the median of the data set. Q2 (median) is the value separating the higher half from the lower half of a set. Q3 is the middle value between the median and the maximum of the data set.
  7. 11 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Q3

    3rd quartile Q2 median Q1 1st quartile Quartile Q1, the first quartile: 25% of the data is below this point. Q2, the second quartile: 50% of the data lies below this point (it is the median) Q3, the third quartile: 75% of the data lies below this point. ¼ of data ¼ of data ¼ of data ¼ of data
  8. 12 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

    3, 2, 3, 4, 9, 2, 10, 6, 8, 9, 3, 9, 8, 4, 10 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Raw data: Ordered data: lower half upper half
  9. 13 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

    2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 13 Quartile
  10. 14 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

    2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 14 Quartile Five-number summary
  11. 15 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Tukey's

    Fences An outlier is any observation outside the range: [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] where • Q1 and Q3 are the lower and upper quartiles • k is some non-negative constant John Tukey proposed that • k = 1.5 indicates an "outlier", and • k = 3 indicates data that is "far out" John Wilder Tukey (1915 – 2000)
  12. 16 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

    2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 Q1 = 3, Q3 = 9, Interquartile range: Q3 - Q1 = 9 - 3 = 6 Lower outlier limit = Q1 - 1.5(Q3 - Q1) = 3 - 1.5×6 = -6 Upper outlier limit = Q1 + 1.5(Q3 - Q1) = 9 + 1.5 ×6 = 18 [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] Tukey's Fences
  13. 18 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://math.fandom.com/wiki/Box_Plot

    0 0.5 1.0 1.5 2.0 Median Maximum Third Quartile First Quartile Minimum IQR Boxplot: five numbers summary
  14. 19 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Boxplot:

    five numbers summary https://thestatsninja.com/2019/02/07/the-box-and-whisker-plot-for-grown-ups/ 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3
  15. 22 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal-Trend

    Decomposition using LOESS (STL) STL decomposes a time series into three components: • trend • seasonal • residual (noise) using Loess method LOESS = LOcally EStimated Scatterplot Smoothing LOESS curve approximation
  16. 23 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Raw

    data Trend: Seasonal: Remainder: + + https://otexts.com/fpp2/stl.html
  17. 24 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Example

    Monthly airline passengers during the years 1949-1960 https://medium.com/wwblog/anomaly-detection-using-stl-76099c9fd5a7 Anomaly Detection using STL
  18. 27 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical

    Hypothesis A statistical hypothesis test is a method used to decide whether the data at hand support a particular hypothesis. Null Hypothesis (H 0 ) and the Alternative Hypothesis (H A ): H 0 : The observed difference is due to chance alone. There are no anomalies. H A : Parameters of the distribution have changed. There is an anomaly.
  19. 28 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

    The probability of obtaining test results is at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. The p-value is used to quantify the statistical significance of a result. A small p-value means that observed outcome would be unlikely under the null hypothesis. Small p-values are strong evidence against the null hypothesis.
  20. 29 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

    Anomaly Anomaly More Likely Observations Observed Data Point P-value Probability Density Set of Possible Results
  21. 30 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks t-statistic

    The ratio of the departure of the estimated value of a parameter from its hypothesised value to its standard error: It is used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened. 30
  22. 31 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks SciPy

    – algorithms for • optimisation • integration • interpolation • eigenvalue problems • algebraic equations • differential equations • statistics and many other classes of problems. https://scipy.org/