$30 off During Our Annual Pro Sale. View Details »

Anomaly Detection. Part 2 – Statistical Methods

Exactpro
PRO
May 30, 2022
21

Anomaly Detection. Part 2 – Statistical Methods

Rostislav Yavorski
Head of Research, Exactpro

“In Lecture 2, we are going to discuss the graphical methods: histogram, box plot, and scatter plot, as well as interquartile range, Tukey's fences, and null hypothesis, t-statistic, p-value.”

AI Testing Talks – Anomaly Detection. 30 May 2022

https://exactpro.com/events/external/ai-testing-talks-anomaly-detection?utm_source=speakerdeck&utm_medium=Refferer&utm_campaign=statistical-methods

---

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

Exactpro
PRO

May 30, 2022
Tweet

Transcript

  1. 1 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    BUILD SOFTWARE TO TEST SOFTWARE
    exactpro.com
    Lecture 2.
    Statistical Methods
    ANOMALY DETECTION FOR AI TESTING
    Rostislav Yavorski
    Head of Research, Exactpro
    30 MAY | 10.00 GET | 11.30 SLST

    View Slide

  2. 2 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Terms
    An outlier is a data point that differs significantly
    from other observations
    Anomalies are patterns in data that do not
    conform to a well-defined notion of normal
    behaviour

    View Slide

  3. 3 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Plan
    1. Graphical Methods
    2. Interquartile Range
    3. Tukey's Fences
    4. Seasonal and Trend Decomposition (STL)
    5. Statistical Hypothesis Test
    6. p-value and t-statistic
    7. SciPy library

    View Slide

  4. 4 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Graphical Methods

    View Slide

  5. 5 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    First, divide the entire range of values into a series of intervals,
    "bins" or "buckets", and then count how many values fall into each interval.
    Histogram

    View Slide

  6. 6 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks

    View Slide

  7. 7 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    A scatter chart displays the relationship
    between 2 numeric variables.
    The position of each dot on the horizontal and vertical axes
    indicates values for a data point.
    Temperature °C Ice Cream Sales
    14.2° $215
    16.4° $325
    11.9° $185
    18.5° $406
    22.1° $522
    19.4° $412
    25.1° $614
    23.4° $544
    15.2° $332
    18.1° $421
    22.6° $445
    17.2° $408
    https://www.mathsisfun.com/data/scatter-xy-plots.html
    Scatter Plot

    View Slide

  8. 8 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks

    View Slide

  9. 9 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Interquartile Range

    View Slide

  10. 10 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Quartile
    Q1 is the middle number between the minimum and the median of the data set.
    Q2 (median) is the value separating the higher half from the lower half of a set.
    Q3 is the middle value between the median and the maximum of the data set.

    View Slide

  11. 11 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Q3
    3rd quartile
    Q2
    median
    Q1
    1st quartile
    Quartile
    Q1, the first quartile: 25% of the data is below this point.
    Q2, the second quartile: 50% of the data lies below this point (it is the median)
    Q3, the third quartile: 75% of the data lies below this point.
    ¼
    of data
    ¼
    of data
    ¼
    of data
    ¼
    of data

    View Slide

  12. 12 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Quartile
    3, 2, 3, 4, 9, 2, 10, 6, 8, 9, 3, 9, 8, 4, 10
    2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
    Raw data:
    Ordered data:
    lower half upper
    half

    View Slide

  13. 13 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
    Ordered data:
    lower half upper half
    min max
    median
    Q2
    Q1 Q3
    13
    Quartile

    View Slide

  14. 14 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
    Ordered data:
    lower half upper half
    min max
    median
    Q2
    Q1 Q3
    14
    Quartile
    Five-number summary

    View Slide

  15. 15 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Tukey's Fences
    An outlier is any observation outside the range:
    [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ]
    where
    ● Q1 and Q3 are the lower and upper quartiles
    ● k is some non-negative constant
    John Tukey proposed that
    ● k = 1.5 indicates an "outlier", and
    ● k = 3 indicates data that is "far out"
    John Wilder Tukey (1915 – 2000)

    View Slide

  16. 16 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
    Ordered data:
    lower half upper half
    min max
    median
    Q2
    Q1 Q3
    Q1 = 3,
    Q3 = 9,
    Interquartile range: Q3 - Q1 = 9 - 3 = 6
    Lower outlier limit = Q1 - 1.5(Q3 - Q1) = 3 - 1.5×6 = -6
    Upper outlier limit = Q1 + 1.5(Q3 - Q1) = 9 + 1.5 ×6 = 18
    [ Q1 - k(Q3 - Q1), Q3 + k(Q3 - Q1) ]
    Tukey's Fences

    View Slide

  17. 17 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
    Boxplot: five numbers summary

    View Slide

  18. 18 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://math.fandom.com/wiki/Box_Plot
    0
    0.5
    1.0
    1.5
    2.0
    Median
    Maximum
    Third Quartile
    First Quartile
    Minimum
    IQR
    Boxplot: five numbers summary

    View Slide

  19. 19 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Boxplot: five numbers summary
    https://thestatsninja.com/2019/02/07/the-box-and-whisker-plot-for-grown-ups/
    2, 2, 3, 3, 3, 4, 4, 6, 8, 8, 9, 9, 9, 10, 10
    Ordered data:
    lower half upper half
    min max
    median
    Q2
    Q1 Q3

    View Slide

  20. 20 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://www.simplypsychology.org/boxplots.html
    Boxplot: five numbers summary

    View Slide

  21. 21 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Seasonal and Trend
    Decomposition (STL)

    View Slide

  22. 22 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Seasonal-Trend Decomposition using LOESS (STL)
    STL decomposes a time series into three components:
    ● trend
    ● seasonal
    ● residual (noise)
    using Loess method
    LOESS = LOcally EStimated Scatterplot Smoothing
    LOESS curve approximation

    View Slide

  23. 23 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Raw data
    Trend:
    Seasonal:
    Remainder:
    +
    +
    https://otexts.com/fpp2/stl.html

    View Slide

  24. 24 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Example
    Monthly airline passengers
    during the years 1949-1960
    https://medium.com/wwblog/anomaly-detection-using-stl-76099c9fd5a7
    Anomaly Detection using STL

    View Slide

  25. 25 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://www.webfx.com/blog/web-design/how-much-traffic-can-your-website-handle/
    Anomaly Detection using STL
    Example
    Web traffic data

    View Slide

  26. 26 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Statistical Hypothesis Test

    View Slide

  27. 27 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Statistical Hypothesis
    A statistical hypothesis test is a method used to decide
    whether the data at hand support a particular hypothesis.
    Null Hypothesis (H
    0
    ) and the Alternative Hypothesis (H
    A
    ):
    H
    0
    : The observed difference is due to chance alone. There are no anomalies.
    H
    A
    : Parameters of the distribution have changed. There is an anomaly.

    View Slide

  28. 28 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    p-value
    The probability of obtaining test results is at least as extreme as the result actually
    observed, under the assumption that the null hypothesis is correct.
    The p-value is used to quantify the statistical significance of a result.
    A small p-value means that observed outcome would be unlikely under the null
    hypothesis. Small p-values are strong evidence against the null hypothesis.

    View Slide

  29. 29 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    p-value
    Anomaly Anomaly
    More Likely Observations
    Observed
    Data Point
    P-value
    Probability Density
    Set of Possible Results

    View Slide

  30. 30 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    t-statistic
    The ratio of the departure of the estimated value of a parameter from its
    hypothesised value to its standard error:
    It is used along with p-value when running hypothesis tests where
    the p-value tells us what the odds are of the results to have happened.
    30

    View Slide

  31. 31 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    SciPy – algorithms for
    ● optimisation
    ● integration
    ● interpolation
    ● eigenvalue problems
    ● algebraic equations
    ● differential equations
    ● statistics
    and many other classes of problems.
    https://scipy.org/

    View Slide

  32. 32 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Thank you!
    Questions?

    View Slide