Anomaly Detection. Part 2 – Statistical Methods

1 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks BUILD
SOFTWARE TO TEST SOFTWARE exactpro.com Lecture 2. Statistical Methods ANOMALY DETECTION FOR AI TESTING Rostislav Yavorski Head of Research, Exactpro 30 MAY | 10.00 GET | 11.30 SLST

2 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Terms
An outlier is a data point that differs significantly from other observations Anomalies are patterns in data that do not conform to a well-defined notion of normal behaviour

3 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Plan
1. Graphical Methods 2. Interquartile Range 3. Tukey's Fences 4. Seasonal and Trend Decomposition (STL) 5. Statistical Hypothesis Test 6. p-value and t-statistic 7. SciPy library

4 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Graphical
Methods

5 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks First,
divide the entire range of values into a series of intervals, "bins" or "buckets", and then count how many values fall into each interval. Histogram

6 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks

7 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks A
scatter chart displays the relationship between 2 numeric variables. The position of each dot on the horizontal and vertical axes indicates values for a data point. Temperature °C Ice Cream Sales 14.2° $215 16.4° $325 11.9° $185 18.5° $406 22.1° $522 19.4° $412 25.1° $614 23.4° $544 15.2° $332 18.1° $421 22.6° $445 17.2° $408 https://www.mathsisfun.com/data/scatter-xy-plots.html Scatter Plot

8 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks

9 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Interquartile
Range

10 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile
Q1 is the middle number between the minimum and the median of the data set. Q2 (median) is the value separating the higher half from the lower half of a set. Q3 is the middle value between the median and the maximum of the data set.

11 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Q3
3rd quartile Q2 median Q1 1st quartile Quartile Q1, the ﬁrst quartile: 25% of the data is below this point. Q2, the second quartile: 50% of the data lies below this point (it is the median) Q3, the third quartile: 75% of the data lies below this point. ¼ of data ¼ of data ¼ of data ¼ of data

12 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile
3, 2, 3, 4, 9, 2, 10, 6, 8, 9, 3, 9, 8, 4, 10 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Raw data: Ordered data: lower half upper half

13 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,
2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 13 Quartile

2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 14 Quartile Five-number summary

15 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Tukey's
Fences An outlier is any observation outside the range: [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] where • Q1 and Q3 are the lower and upper quartiles • k is some non-negative constant John Tukey proposed that • k = 1.5 indicates an "outlier", and • k = 3 indicates data that is "far out" John Wilder Tukey (1915 – 2000)

2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 Q1 = 3, Q3 = 9, Interquartile range: Q3 - Q1 = 9 - 3 = 6 Lower outlier limit = Q1 - 1.5(Q3 - Q1) = 3 - 1.5×6 = -6 Upper outlier limit = Q1 + 1.5(Q3 - Q1) = 9 + 1.5 ×6 = 18 [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] Tukey's Fences

17 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
Boxplot: ﬁve numbers summary

18 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://math.fandom.com/wiki/Box_Plot
0 0.5 1.0 1.5 2.0 Median Maximum Third Quartile First Quartile Minimum IQR Boxplot: ﬁve numbers summary

19 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Boxplot:
ﬁve numbers summary https://thestatsninja.com/2019/02/07/the-box-and-whisker-plot-for-grown-ups/ 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3

20 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.simplypsychology.org/boxplots.html
Boxplot: ﬁve numbers summary

21 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal
and Trend Decomposition (STL)

22 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal-Trend
Decomposition using LOESS (STL) STL decomposes a time series into three components: • trend • seasonal • residual (noise) using Loess method LOESS = LOcally EStimated Scatterplot Smoothing LOESS curve approximation

23 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Raw
data Trend: Seasonal: Remainder: + + https://otexts.com/fpp2/stl.html

24 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Example
Monthly airline passengers during the years 1949-1960 https://medium.com/wwblog/anomaly-detection-using-stl-76099c9fd5a7 Anomaly Detection using STL

25 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.webfx.com/blog/web-design/how-much-traffic-can-your-website-handle/
Anomaly Detection using STL Example Web traffic data

26 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical
Hypothesis Test

27 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical
Hypothesis A statistical hypothesis test is a method used to decide whether the data at hand support a particular hypothesis. Null Hypothesis (H 0 ) and the Alternative Hypothesis (H A ): H 0 : The observed difference is due to chance alone. There are no anomalies. H A : Parameters of the distribution have changed. There is an anomaly.

28 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value
The probability of obtaining test results is at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. The p-value is used to quantify the statistical signiﬁcance of a result. A small p-value means that observed outcome would be unlikely under the null hypothesis. Small p-values are strong evidence against the null hypothesis.

29 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value
Anomaly Anomaly More Likely Observations Observed Data Point P-value Probability Density Set of Possible Results

30 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks t-statistic
The ratio of the departure of the estimated value of a parameter from its hypothesised value to its standard error: It is used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened. 30

31 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks SciPy
– algorithms for • optimisation • integration • interpolation • eigenvalue problems • algebraic equations • diﬀerential equations • statistics and many other classes of problems. https://scipy.org/

32 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Thank
you! Questions?

Anomaly Detection. Part 2 – Statistical Methods

Anomaly Detection. Part 2 – Statistical Methods

Exactpro PRO

More Decks by Exactpro

Featured

Transcript

1 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks BUILD

2 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Terms

3 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Plan

4 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Graphical

5 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks First,

6 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks

7 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks A

8 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks

9 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Interquartile

10 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

11 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Q3

12 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

13 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

14 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

15 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Tukey's

16 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

17 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

18 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://math.fandom.com/wiki/Box_Plot

19 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Boxplot:

20 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.simplypsychology.org/boxplots.html

21 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal

22 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal-Trend

23 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Raw

24 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Example

25 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.webfx.com/blog/web-design/how-much-traffic-can-your-website-handle/

26 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical

27 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical

28 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

29 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

30 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks t-statistic

31 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks SciPy

32 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Thank