1 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
BUILD SOFTWARE TO TEST SOFTWARE
exactpro.com
Lecture 2.
Statistical Methods
ANOMALY DETECTION FOR AI TESTING
Rostislav Yavorski
Head of Research, Exactpro
30 MAY | 10.00 GET | 11.30 SLST
Slide 2
Slide 2 text
2 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Terms
An outlier is a data point that differs significantly
from other observations
Anomalies are patterns in data that do not
conform to a well-defined notion of normal
behaviour
Slide 3
Slide 3 text
3 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Plan
1. Graphical Methods
2. Interquartile Range
3. Tukey's Fences
4. Seasonal and Trend Decomposition (STL)
5. Statistical Hypothesis Test
6. p-value and t-statistic
7. SciPy library
Slide 4
Slide 4 text
4 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Graphical Methods
Slide 5
Slide 5 text
5 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
First, divide the entire range of values into a series of intervals,
"bins" or "buckets", and then count how many values fall into each interval.
Histogram
Slide 6
Slide 6 text
6 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Slide 7
Slide 7 text
7 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
A scatter chart displays the relationship
between 2 numeric variables.
The position of each dot on the horizontal and vertical axes
indicates values for a data point.
Temperature °C Ice Cream Sales
14.2° $215
16.4° $325
11.9° $185
18.5° $406
22.1° $522
19.4° $412
25.1° $614
23.4° $544
15.2° $332
18.1° $421
22.6° $445
17.2° $408
https://www.mathsisfun.com/data/scatter-xy-plots.html
Scatter Plot
Slide 8
Slide 8 text
8 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Slide 9
Slide 9 text
9 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Interquartile Range
Slide 10
Slide 10 text
10 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Quartile
Q1 is the middle number between the minimum and the median of the data set.
Q2 (median) is the value separating the higher half from the lower half of a set.
Q3 is the middle value between the median and the maximum of the data set.
Slide 11
Slide 11 text
11 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Q3
3rd quartile
Q2
median
Q1
1st quartile
Quartile
Q1, the first quartile: 25% of the data is below this point.
Q2, the second quartile: 50% of the data lies below this point (it is the median)
Q3, the third quartile: 75% of the data lies below this point.
¼
of data
¼
of data
¼
of data
¼
of data
13 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
Ordered data:
lower half upper half
min max
median
Q2
Q1 Q3
13
Quartile
Slide 14
Slide 14 text
14 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
Ordered data:
lower half upper half
min max
median
Q2
Q1 Q3
14
Quartile
Five-number summary
Slide 15
Slide 15 text
15 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Tukey's Fences
An outlier is any observation outside the range:
[ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ]
where
● Q1 and Q3 are the lower and upper quartiles
● k is some non-negative constant
John Tukey proposed that
● k = 1.5 indicates an "outlier", and
● k = 3 indicates data that is "far out"
John Wilder Tukey (1915 – 2000)
17 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
Boxplot: five numbers summary
Slide 18
Slide 18 text
18 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
https://math.fandom.com/wiki/Box_Plot
0
0.5
1.0
1.5
2.0
Median
Maximum
Third Quartile
First Quartile
Minimum
IQR
Boxplot: five numbers summary
Slide 19
Slide 19 text
19 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Boxplot: five numbers summary
https://thestatsninja.com/2019/02/07/the-box-and-whisker-plot-for-grown-ups/
2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
Ordered data:
lower half upper half
min max
median
Q2
Q1 Q3
Slide 20
Slide 20 text
20 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
https://www.simplypsychology.org/boxplots.html
Boxplot: five numbers summary
Slide 21
Slide 21 text
21 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Seasonal and Trend
Decomposition (STL)
Slide 22
Slide 22 text
22 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Seasonal-Trend Decomposition using LOESS (STL)
STL decomposes a time series into three components:
● trend
● seasonal
● residual (noise)
using Loess method
LOESS = LOcally EStimated Scatterplot Smoothing
LOESS curve approximation
Slide 23
Slide 23 text
23 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Raw data
Trend:
Seasonal:
Remainder:
+
+
https://otexts.com/fpp2/stl.html
Slide 24
Slide 24 text
24 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Example
Monthly airline passengers
during the years 1949-1960
https://medium.com/wwblog/anomaly-detection-using-stl-76099c9fd5a7
Anomaly Detection using STL
Slide 25
Slide 25 text
25 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
https://www.webfx.com/blog/web-design/how-much-traffic-can-your-website-handle/
Anomaly Detection using STL
Example
Web traffic data
Slide 26
Slide 26 text
26 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Statistical Hypothesis Test
Slide 27
Slide 27 text
27 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Statistical Hypothesis
A statistical hypothesis test is a method used to decide
whether the data at hand support a particular hypothesis.
Null Hypothesis (H
0
) and the Alternative Hypothesis (H
A
):
H
0
: The observed difference is due to chance alone. There are no anomalies.
H
A
: Parameters of the distribution have changed. There is an anomaly.
Slide 28
Slide 28 text
28 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
p-value
The probability of obtaining test results is at least as extreme as the result actually
observed, under the assumption that the null hypothesis is correct.
The p-value is used to quantify the statistical significance of a result.
A small p-value means that observed outcome would be unlikely under the null
hypothesis. Small p-values are strong evidence against the null hypothesis.
Slide 29
Slide 29 text
29 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
p-value
Anomaly Anomaly
More Likely Observations
Observed
Data Point
P-value
Probability Density
Set of Possible Results
Slide 30
Slide 30 text
30 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
t-statistic
The ratio of the departure of the estimated value of a parameter from its
hypothesised value to its standard error:
It is used along with p-value when running hypothesis tests where
the p-value tells us what the odds are of the results to have happened.
30
Slide 31
Slide 31 text
31 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
SciPy – algorithms for
● optimisation
● integration
● interpolation
● eigenvalue problems
● algebraic equations
● differential equations
● statistics
and many other classes of problems.
https://scipy.org/
Slide 32
Slide 32 text
32 BUILD SOFTWARE TO TEST SOFTWARE
AI Testing Talks
Thank you!
Questions?