Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Anomaly Detection. Part 3 – Machine Learning Approach

Exactpro
PRO
June 06, 2022
19

Anomaly Detection. Part 3 – Machine Learning Approach

Rostislav Yavorski
Head of Research, Exactpro

“In Lecture 3, we are going to discuss the unsupervised, supervised, and semi-supervised learning methods, placing special emphasis on clustering and dimensionality reduction.”

AI Testing Talks – Anomaly Detection. 6 June 2022

https://exactpro.com/events/external/ai-testing-talks-anomaly-detection?utm_source=speakerdeck&utm_medium=Refferer&utm_campaign=machine-learning

---

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

Exactpro
PRO

June 06, 2022
Tweet

Transcript

  1. 1 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    BUILD SOFTWARE TO TEST SOFTWARE
    exactpro.com
    Lecture 3.
    Machine Learning Approach
    MINI- COURSE ON ANOMALY DETECTION FOR AI TESTING
    6 june | 10.00 GET | 11.30 SLST
    Rostislav Yavorski
    Head of Research, Exactpro

    View Slide

  2. 2 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Terms
    An outlier is a data point that differs significantly
    from other observations
    Anomalies are patterns in data that do not
    conform to a well-defined notion of normal
    behaviour

    View Slide

  3. 3 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    ● fraud detection
    ● health monitoring
    ● medical diagnosis
    ● system fault detection
    ● cyber-security intrusion detection
    ● improving the performance of
    machine learning algorithms
    See Lecture 1 on Applications
    3
    https://youtu.be/Mi13lqDVET0

    View Slide

  4. 4 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Challenges in Anomaly Detection
    ● Definition of normal behaviour
    is extremely challenging
    ● Noise data aren’t anomalies
    ● The definition of anomaly
    is domain-specific
    ● Anomalies evolve over time
    ● Getting a set of labeled anomalous
    instances is difficult

    View Slide

  5. 5 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    What is normal
    ● Average characteristics
    ● Similar to many
    ● Quite frequent occurrence
    ● Predictable from the past
    ● Labeled as normal

    View Slide

  6. 6 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    See Lecture 2 on Statistical Methods
    ● Graphical Methods
    ● Interquartile Range
    ● Tukey's Fences
    ● Seasonal and Trend Decomposition
    ● Statistical Hypothesis Test
    ● p-value and t-statistic
    https://youtu.be/7Rz84cp1xQA

    View Slide

  7. 7 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks

    View Slide

  8. 8 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks

    View Slide

  9. 9 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Unsupervised learning

    View Slide

  10. 10 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Unsupervised anomaly detection
    The objective is to detect rare objects or events
    without any prior knowledge:
    Step 1. Modelling the normal data distribution
    Step 2. Defining a measurement to classify
    samples as anomalous or normal.

    View Slide

  11. 11 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Unsupervised anomaly detection
    The objective is to detect rare objects or events
    without any prior knowledge:
    Step 1. Modelling the normal data distribution
    Step 2. Defining a measurement to classify
    samples as anomalous or normal

    View Slide

  12. 12 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Unsupervised anomaly detection
    The objective is to detect rare objects or events
    without any prior knowledge:
    Step 1. Modelling the normal data distribution
    ● Clustering (grouping)
    ● Dimensionality (number of parameters)
    reduction
    Step 2. Defining a measurement to classify
    samples as anomalous or normal

    View Slide

  13. 13 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Clustering

    View Slide

  14. 14 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Unsupervised. Clustering
    Clustering is the task of dividing the population or
    data points into a number of groups such that
    data points in the same groups are more similar to
    other data points in the same group than those in
    other groups

    View Slide

  15. 15 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Clustering algorithms
    Connectivity models are based on
    distance connectivity
    Centroid models represents each cluster
    by a single mean vector
    Density models define clusters as
    connected dense regions in the data
    space

    View Slide

  16. 16 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Agglomerative "bottom-up" algorithm
    chick duckling
    cat
    rabbit hen
    dog
    rooster goose

    View Slide

  17. 17 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    rabbit
    Agglomerative "bottom-up" algorithm
    chick
    duckling
    cat
    hen
    dog
    rooster
    goose

    View Slide

  18. 18 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    rabbit
    chick
    duckling
    cat
    hen
    dog
    goose rooster
    fish

    View Slide

  19. 19 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    rabbit
    chick
    duckling
    cat
    hen
    dog
    goose rooster
    fish
    Anomaly

    View Slide

  20. 20 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Dimensionality Reduction

    View Slide

  21. 21 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    What is dimension
    The dimension of a dataset corresponds to
    the number of attributes that exist in a
    dataset.
    A dataset with a large number of attributes
    (a hundred or more) is referred to as high
    dimensional data.
    Two dimensional array

    View Slide

  22. 22 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Dimensionality Reduction
    Dimension reduction is
    the transformation of data
    from a high-dimensional space
    into a low-dimensional space
    so that the low-dimensional representation
    retains some meaningful properties
    of the original data.
    Three dimensional coordinate
    system

    View Slide

  23. 23 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Example: Software Defect Prediction
    Number of instances: 10 885 modules. Creators: NASA, http://mdp.ivv.nasa.gov.
    Hypotheses:
    ● code with complicated pathways are more error-prone
    ● code that is hard to read is more likely to be fault prone
    ● static measures are useful to guide software quality predictions
    ● static measures can never be a certain indicator of the presence of a fault
    Number of attributes (dimensionality): 22
    ● 5 different lines of code measure
    ● 3 McCabe metrics (cyclomatic, essential, design complexity)
    ● 4 base Halstead measures (volume, length, difficulty, intelligence)
    ● 8 derived Halstead measures, a branch-count
    ● 1 goal field (module has/has not one or more reported defects)
    https://www.kaggle.com/datasets/semustafacevik/software-defect-prediction

    View Slide

  24. 24 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Example. Operational Data from Enterprise Application
    https://www.kaggle.com/datasets/anomalydetectionml/rawdata
    Goal: effectively detect run-time
    anomalies using machine learning on
    operation metrics
    The dataset consists of metrics
    measured from the operating system
    and from WebLogic Server
    monitoring beans

    View Slide

  25. 25 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://towardsdatascience.com/11-dimensionality-reduction-techniques-you-should-know-in-2021-dcb9500d388b
    Only keep the most
    important features
    Find a combination
    of new features
    Linear
    methods
    Non-linear
    methods
    ● Backward
    elimination
    ● Forward selection
    ● Random forests
    ● PCA
    ● Factor analysis
    ● LDA
    ● Truncated SVD
    ● Kernel PCA
    ● t-SNE
    ● MDS
    ● Isomap
    Dimensionality Reduction
    methods

    View Slide

  26. 26 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Principal Component Analysis

    View Slide

  27. 27 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Principal Component Analysis (PCA)
    Principal components are new
    variables that are linear combinations
    of the initial variables
    These combinations are done in such a
    way that the new variables are
    uncorrelated and most of the
    information within the initial variables
    compressed into the first components
    https://builtin.com/data-science/step-step-explanation-principal-component-analysis

    View Slide

  28. 28 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Principal Component Analysis (PCA)
    Prerequisites:
    ● Linear algebra
    ○ matrix multiplication, transposition, inverses
    ○ matrix decomposition
    ○ eigenvectors and eigenvalues
    ● Statistics
    ○ standardization, variance, covariance
    ○ independence
    ● Machine learning
    ○ linear regression
    ○ feature selection
    https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c

    View Slide

  29. 29 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Principal Component Analysis (PCA)
    1. Normalize the initial variables
    2. Compute the covariance matrix: the correlations
    between all the possible pairs of variables
    3. Compute eigenvectors and rank the
    eigenvalues in descending order (get the
    principal components in order of significance)
    4. Compute the feature vector, which is a matrix
    that has as columns the eigenvectors of the
    components that we decide to keep
    5. Reorient the data from the original axes to the
    ones represented by the principal components
    https://builtin.com/data-science/step-step-explanation-principal-component-analysis

    View Slide

  30. 30 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Factor Analysis

    View Slide

  31. 31 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Factor Analysis (FA)
    It is a method for modeling observed variables and
    their relationship in terms of unobserved variables,
    i.e. factors
    It is used to reduce a large number of variables into
    fewer numbers of factors
    Factor analysis is kind of extension of PCA
    FA is a complex mathematical procedure, so it is
    performed with software
    Var1
    Var2
    Var3
    Var4
    Var5
    Var6
    Var7
    Var8
    Factor A
    Factor B

    View Slide

  32. 32 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Factor Analysis (FA)
    There are two types of FA, exploratory (EFA)
    and confirmatory (CFA)
    EFA is used to find the underlying structure of a large
    set of variables and reduce data to a smaller set of
    summary variables, but EFA can generate a large
    number of possible models for your data
    If you do have an idea about what the models look like,
    and you want to test your hypotheses about the data
    structure, CFA is a better approach
    https://www.statisticshowto.com/factor-analysis/
    Geometric interpretation
    of Factor Analysis

    View Slide

  33. 33 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Supervised learning

    View Slide

  34. 34 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Supervised anomaly detection
    Items in the training dataset are labeled
    into two categories: normal and abnormal
    The model will use these examples to
    recognize abnormal patterns in the
    previously unseen data
    It is rarely used due to unavailability of
    labelled data for the known anomalies
    https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning

    View Slide

  35. 35 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    Conclusion

    View Slide

  36. 36 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://github.com/openvinotoolkit/anomalib
    A library for benchmarking, developing and deploying deep
    learning anomaly detection algorithms

    View Slide

  37. 37 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    https://www.oreilly.com/library/view/hands-on-unsupervised-learning/9781492035633/

    View Slide

  38. 38 BUILD SOFTWARE TO TEST SOFTWARE
    AI Testing Talks
    AI Testing Talks
    Thank you!
    Questions?

    View Slide