11

# Anomaly Detection. Part 2 – Statistical Methods

Rostislav Yavorski
Head of Research, Exactpro

“In Lecture 2, we are going to discuss the graphical methods: histogram, box plot, and scatter plot, as well as interquartile range, Tukey's fences, and null hypothesis, t-statistic, p-value.”

AI Testing Talks – Anomaly Detection. 30 May 2022

https://exactpro.com/events/external/ai-testing-talks-anomaly-detection?utm_source=speakerdeck&utm_medium=Refferer&utm_campaign=statistical-methods

---

May 30, 2022

## Transcript

1. ### 1 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks BUILD

SOFTWARE TO TEST SOFTWARE exactpro.com Lecture 2. Statistical Methods ANOMALY DETECTION FOR AI TESTING Rostislav Yavorski Head of Research, Exactpro 30 MAY | 10.00 GET | 11.30 SLST
2. ### 2 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Terms

An outlier is a data point that diﬀers signiﬁcantly from other observations Anomalies are patterns in data that do not conform to a well-deﬁned notion of normal behaviour
3. ### 3 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Plan

1. Graphical Methods 2. Interquartile Range 3. Tukey's Fences 4. Seasonal and Trend Decomposition (STL) 5. Statistical Hypothesis Test 6. p-value and t-statistic 7. SciPy library

Methods
5. ### 5 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks First,

divide the entire range of values into a series of intervals, "bins" or "buckets", and then count how many values fall into each interval. Histogram

7. ### 7 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks A

scatter chart displays the relationship between 2 numeric variables. The position of each dot on the horizontal and vertical axes indicates values for a data point. Temperature °C Ice Cream Sales 14.2° \$215 16.4° \$325 11.9° \$185 18.5° \$406 22.1° \$522 19.4° \$412 25.1° \$614 23.4° \$544 15.2° \$332 18.1° \$421 22.6° \$445 17.2° \$408 https://www.mathsisfun.com/data/scatter-xy-plots.html Scatter Plot

Range
10. ### 10 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

Q1 is the middle number between the minimum and the median of the data set. Q2 (median) is the value separating the higher half from the lower half of a set. Q3 is the middle value between the median and the maximum of the data set.
11. ### 11 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Q3

3rd quartile Q2 median Q1 1st quartile Quartile Q1, the ﬁrst quartile: 25% of the data is below this point. Q2, the second quartile: 50% of the data lies below this point (it is the median) Q3, the third quartile: 75% of the data lies below this point. ¼ of data ¼ of data ¼ of data ¼ of data
12. ### 12 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Quartile

3, 2, 3, 4, 9, 2, 10, 6, 8, 9, 3, 9, 8, 4, 10 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Raw data: Ordered data: lower half upper half
13. ### 13 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 13 Quartile
14. ### 14 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 14 Quartile Five-number summary
15. ### 15 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Tukey's

Fences An outlier is any observation outside the range: [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] where • Q1 and Q3 are the lower and upper quartiles • k is some non-negative constant John Tukey proposed that • k = 1.5 indicates an "outlier", and • k = 3 indicates data that is "far out" John Wilder Tukey (1915 – 2000)
16. ### 16 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks 2,

2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3 Q1 = 3, Q3 = 9, Interquartile range: Q3 - Q1 = 9 - 3 = 6 Lower outlier limit = Q1 - 1.5(Q3 - Q1) = 3 - 1.5×6 = -6 Upper outlier limit = Q1 + 1.5(Q3 - Q1) = 9 + 1.5 ×6 = 18 [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ] Tukey's Fences
17. ### 17 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

Boxplot: ﬁve numbers summary
18. ### 18 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://math.fandom.com/wiki/Box_Plot

0 0.5 1.0 1.5 2.0 Median Maximum Third Quartile First Quartile Minimum IQR Boxplot: ﬁve numbers summary
19. ### 19 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Boxplot:

ﬁve numbers summary https://thestatsninja.com/2019/02/07/the-box-and-whisker-plot-for-grown-ups/ 2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10 Ordered data: lower half upper half min max median Q2 Q1 Q3
20. ### 20 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.simplypsychology.org/boxplots.html

Boxplot: ﬁve numbers summary
21. ### 21 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal

and Trend Decomposition (STL)
22. ### 22 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Seasonal-Trend

Decomposition using LOESS (STL) STL decomposes a time series into three components: • trend • seasonal • residual (noise) using Loess method LOESS = LOcally EStimated Scatterplot Smoothing LOESS curve approximation
23. ### 23 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Raw

data Trend: Seasonal: Remainder: + + https://otexts.com/fpp2/stl.html
24. ### 24 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Example

Monthly airline passengers during the years 1949-1960 https://medium.com/wwblog/anomaly-detection-using-stl-76099c9fd5a7 Anomaly Detection using STL
25. ### 25 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks https://www.webfx.com/blog/web-design/how-much-traffic-can-your-website-handle/

Anomaly Detection using STL Example Web traffic data
26. ### 26 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical

Hypothesis Test
27. ### 27 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Statistical

Hypothesis A statistical hypothesis test is a method used to decide whether the data at hand support a particular hypothesis. Null Hypothesis (H 0 ) and the Alternative Hypothesis (H A ): H 0 : The observed difference is due to chance alone. There are no anomalies. H A : Parameters of the distribution have changed. There is an anomaly.
28. ### 28 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

The probability of obtaining test results is at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. The p-value is used to quantify the statistical signiﬁcance of a result. A small p-value means that observed outcome would be unlikely under the null hypothesis. Small p-values are strong evidence against the null hypothesis.
29. ### 29 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks p-value

Anomaly Anomaly More Likely Observations Observed Data Point P-value Probability Density Set of Possible Results
30. ### 30 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks t-statistic

The ratio of the departure of the estimated value of a parameter from its hypothesised value to its standard error: It is used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened. 30
31. ### 31 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks SciPy

– algorithms for • optimisation • integration • interpolation • eigenvalue problems • algebraic equations • diﬀerential equations • statistics and many other classes of problems. https://scipy.org/
32. ### 32 BUILD SOFTWARE TO TEST SOFTWARE AI Testing Talks Thank

you! Questions?