Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Statistics of Web Performance Measurement and Anomaly Detection

Statistics of Web Performance Measurement and Anomaly Detection

The first step to improving any process or product is to measure how it performs. The second step is data analysis. In this talk, we will analyse page load time experienced by real users and associated business metrics. We will look at various statistical methods and algorithms that may be used, and figure out the ones that make the most sense given the dataset we have. Should we use the mean, median or mode? Is our distribution Normal, Log-Normal or something else? What is the system’s steady-state, and how will I know if it deviates? Can my alerts adapt to the seasonality of my traffic? What is Holt-Winters, how come Nelson Rules and is Kolomogorov-Smirnov a type of Vodka?

The techniques covered in this talk will help you react quickly to changes in your system, and possibly prevent or minimize the impact of outages.

https://phpconference.com/session/statistics-of-web-performance-measurement-and-anomaly-detection/

Philip Tellis

June 01, 2016
Tweet

More Decks by Philip Tellis

Other Decks in Technology

Transcript

  1. Web Performance
    Statistics and Anomaly Detection

    View full-size slide

  2. Philip Tellis
    @bluesmoon
    http://tech.bluesmoon.info
    http://www.soasta.com/mpulse/

    View full-size slide

  3. 0
    What do we measure?

    View full-size slide

  4. Perception & Reaction
    • How slow or fast did the page feel for the user?
    • How did the user react to their experience?
    • The user's environment that influences perceived
    performance

    View full-size slide

  5. boomerang
    https://github.com/soasta/boomerang
    This talk will not cover how we measure, it will cover what we
    do once we've collected measurements

    View full-size slide

  6. 0.1 Perceived performance
    • Measure the time from user initiating an action
    and that action completing
    • Measure smoothness and responsiveness
    • Measuring performance should not affect
    performance
    • Measure trends over time

    View full-size slide

  7. 0.2 Reaction
    • How much time do users spend on a site?
    • How many actions does the user take?
    • Did the user bounce or convert?
    • What was the value of the conversion?

    View full-size slide

  8. 0.3 Environment
    • User Agent, Geography, Network (advertised
    and measured)
    • Page size, DOM nodes, scripts, css, etc.
    • CPU, Screen, Battery, Memory usage

    View full-size slide

  9. We measure Real User
    Experiences

    View full-size slide

  10. There's a lot of this and it
    can get noisy

    View full-size slide

  11. Management expects 1
    number to rule them all
    or better yet, a colour

    View full-size slide

  12. 1
    Basics:

    Distributions, Summaries & Filtering

    View full-size slide

  13. 1.1 Log-Normal distribution
    log() of x-axis is Normal in nature, this is most common

    View full-size slide

  14. 1.1 Bi-modal distribution
    What causes this?

    View full-size slide

  15. 1.1 In Wirklichkeit
    What looks like a single Log-Normal distribution is
    a sum of multiple slightly different log-normal
    distributions

    View full-size slide

  16. 1.1 Dimension Split

    View full-size slide

  17. 1.2 Summarization — One metric to rule them all
    • One normally uses the Arithmetic Mean:


    • But for Log-Normal distributions, the Geometric
    mean makes more sense:

    View full-size slide

  18. • But our distributions are never normal, and often
    not even log-normal
    • And a single number does not tell us if the
    distribution was multi-modal

    View full-size slide

  19. 1.2 Summarization — Midgard
    • A better metric that works for non parametric
    distributions is the Median, or 50th percentile
    defined as the middle point of an ordered
    dataset.
    • Some people like to use worse percentiles like
    the 75th, 95th or 98th, which gives them a better
    idea of the worst user experiences

    View full-size slide

  20. 1.2 Spread
    • Apart from central tendency, we also need to know
    the spread of our curve
    • The traditional metric used is the Standard Deviation
    • This is a bad idea:

    https://www.edge.org/response-detail/25401
    • The Mean Absolute Deviation or MAD is a better
    measure

    View full-size slide

  21. 1.2 Another MAD
    • However, the (Arithmetic) Mean is used for
    Normal distributions.
    • And a Geometric Mean Absolute Deviation is
    asymmetric
    • So we use the Median Absolute Deviation

    View full-size slide

  22. 1.2 Median Absolute Deviation
    1. Take the absolute delta of each point with the
    median:

    2. Take the median of this deviation:
    3. Multiply this by 1.4826 to get the MAD

    https://en.wikipedia.org/wiki/Median_absolute_deviation

    View full-size slide

  23. Then there's the

    Inter Quartile Range
    IQR = Q3
    - Q1
    Also known as the Middle 50
    This is used in Box Plots

    View full-size slide

  24. 1.3 Filtering
    All filtering is Band-Pass, the difference is in how the bands
    are selected

    View full-size slide

  25. 1.3 Filter Band Selection
    • Static Filtering: Based on static thresholds, useful
    to get rid of absurd, obviously fake data
    • IQR Filtering: Based on the IQR, determines the
    band based on the dataset to eliminate outliers


    • Dynamic Filtering: Adjust based on trends and
    seasonality

    View full-size slide

  26. Don't discard outliers
    study them for patterns

    View full-size slide

  27. 2
    Advanced:
    Hypothesis Testing, Forecasting &
    Anomaly Detection

    View full-size slide

  28. 2.1 Kolmogorov-Smirnov

    View full-size slide

  29. 2.1 Kolmogorov-Smirnov
    • Easily compare two distributions
    • Good for 1-dimensional distributions







    https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

    View full-size slide

  30. For multi-dimensional
    data, Kruskal-Wallis
    We won't go into that today
    https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance

    View full-size slide

  31. We will talk about Nelson Rules
    • Based on the idea that all natural systems have intrinsic
    randomness
    • If the data loses its randomness, something artificial is
    affecting it
    https://en.wikipedia.org/wiki/Nelson_rules

    View full-size slide

  32. 2.2 Nelson Rules
    • Based on Standard Deviation, which is not great
    • Requires Normal Distribution, which we don't
    have
    • We can solve the first by using MAD instead
    • The second problem requires us to get a little
    creative

    View full-size slide

  33. Converting a Random Log-Normal like distribution to Normal
    • For a Log-Normal distribution, simply taking the loge
    () of
    the random variable is sufficient to get a Normal
    distribution
    • For other cases, we find that looking at the delta between
    points rather than the points themselves generates a
    Normal distribution
    • The deltas form what is known as a Random Walk
    • We can refer to the deltas as a first order differential
    between points, although strictly speaking that would
    require a continuous random variable

    View full-size slide

  34. Differentials
    Random variable following Log-normal distribution
    Delta of Random variable follows
    a Normal distribution

    View full-size slide

  35. Nelson Rules
    • So by using a first order differential of our data,
    split by dimension
    • And using the Median Absolute Deviation
    • We get a reasonable idea of whether our data is
    insufficiently random


    • This is good for short time ranges

    View full-size slide

  36. What about long term
    forecasts?

    View full-size slide

  37. 2.3 What is Triple Exponential Smoothing?
    • A simple moving average is the most basic form of
    smoothing
    • Exponential smoothing extends this to use exponentially
    decreasing weights for past terms
    • Double Exponential Smoothing adds a trend factor, eg:
    website increases in popularity over time
    • Triple Exponential Smoothing adds seasonality, for example,
    changes in traffic over the course of a day, week or year

    View full-size slide

  38. Holt-Winters Triple Exponential Smoothing
    • This is great for websites because it can be used
    to forecast traffic based on past trends and
    seasonality
    • For example, predict next Christmas' traffic by
    looking at the last 2 Christmases
    • But we can do better by comparing past forecasts
    with what actually happened, and calculating a
    smoothed tolerance for the forecast

    View full-size slide

  39. Dynamic Filter Bands based on H-W 3xS
    • Generate smoothed curve using H-W
    • Calculate deltas of past forecasts
    • Generate smoothed delta curve using H-W
    • Apply smoothed deltas to forecast to create tolerance bands
    • If current forecast exceeds bands, we have an anomaly, so alert on it

    View full-size slide

  40. How do we determine trend & seasonality factors?
    • Triple Exponential Smoothing requires
    smoothing factors for the trend and seasonality
    • Trend is simple, it's the slope of the curve after
    apply single exponential smoothing
    • Seasonality is harder since it's non-linear

    View full-size slide

  41. Enter Fourier Analysis
    Typically used in Digital Signal Processing,

    but in both cases we're dealing with Sine waves
    https://en.wikipedia.org/wiki/Fourier_analysis

    View full-size slide

  42. Early warnings for some extreme anomalies
    • We recently had a case where traffic went up
    17,000% in a few minutes
    • What could we do to detect this and alert early
    enough?

    View full-size slide

  43. Anomaly Detection on First & Second Order Differentials
    • First order differential is the rate of change (velocity) of our
    random variable


    • Second order differential is the rate of change of rate of
    change (acceleration) of our random variable


    • Applying the same anomaly detection algorithms to these
    deltas can help us identify when we're moving too fast in the
    wrong direction.

    View full-size slide

  44. And when the stakeholders just want a colour
    Data Science Workbench

    View full-size slide

  45. References
    • Log-Normal Distribution: https://en.wikipedia.org/wiki/Log-normal_distribution
    • Multi-modal distribution: https://en.wikipedia.org/wiki/Multimodal_distribution
    • Marsaglia-Polar method for Gaussian random number generation:

    https://en.wikipedia.org/wiki/Marsaglia_polar_method
    • JavaScript gist to generate Normal & Log-Normal random distributions

    https://gist.github.com/bluesmoon/7925696
    • Time to retire the Standard Deviation: https://www.edge.org/response-detail/25401
    • Median Absolute Deviation: https://en.wikipedia.org/wiki/Median_absolute_deviation
    • Inter-quartile Range: https://en.wikipedia.org/wiki/Interquartile_range
    • Kolmogorov-Smirnov Test: https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test
    • Kruskal-Wallis Test: 

    https://en.wikipedia.org/wiki/Kruskal–Wallis_one-way_analysis_of_variance
    • Nelson Rules: https://en.wikipedia.org/wiki/Nelson_rules
    • Random Walk: https://en.wikipedia.org/wiki/Random_walk
    • Moving Average: https://en.wikipedia.org/wiki/Moving_average
    • Exponential Smoothing: https://en.wikipedia.org/wiki/Exponential_smoothing
    • Fourier Analysis: https://en.wikipedia.org/wiki/Fourier_analysis
    • SOASTA mPulse: https://www.soasta.com/performance-monitoring/
    • SOASTA DSWB: https://www.soasta.com/data-science-workbench/
    • boomerang: https://github.com/soasta/boomerang

    View full-size slide

  46. Further Reading
    • Kruskal-Wallis H Test using SPSS Statistics
    • How to Think like a Data Scientist
    • It Probably Works
    • Regression v/s Curve Fitting
    • Topology looks for the Patterns inside Big
    Data
    • RegTools: A Julia Package for Assisting
    Regression Analysis
    • Automating big-data analysis
    • Animated Math
    • Unsupervised Learning with Even Less
    Supervision Using Bayesian Optimization
    • The tensor renaissance in data science
    • Statisticians issue warning over misuse of P
    values
    • How to share data with a statistician
    • Statistics for Engineers: Applying statistical
    techniques to operations data
    • Calculus Learning Guide
    • Comparing Python Clustering Algorithms
    • Finding surprising patterns in a time series
    database in linear time and space
    • How to build an anomaly detection engine
    with Spark, Akka, and Cassandra
    • Statistics Done Wrong: The woefully
    complete guide
    • Circles, Sines & Signals
    • How to actually learn Data Science
    • The Risky Eclipse of Statisticians
    • Fundamental frequency estimation and
    supervised learning
    • Outlier Detection Gets a Makeover - Surprise
    Discovery in Scientific Big Data

    View full-size slide

  47. Philip Tellis
    @bluesmoon
    http://tech.bluesmoon.info
    http://www.soasta.com/mpulse/

    View full-size slide

  48. Image Credits
    • Usain Bolt: 

    https://www.flickr.com/photos/sumofmarc/7795253222/
    • 100m dash:

    http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one-
    race-every-medalist-ever.html
    • Kilroy Schematic:

    http://en.wikipedia.org/wiki/File:KilroySchematic.svg
    • Memes from ImgFlip

    View full-size slide