Statistics of Web Performance Measurement and Anomaly Detection

Statistics of Web Performance Measurement and Anomaly Detection

The first step to improving any process or product is to measure how it performs. The second step is data analysis. In this talk, we will analyse page load time experienced by real users and associated business metrics. We will look at various statistical methods and algorithms that may be used, and figure out the ones that make the most sense given the dataset we have. Should we use the mean, median or mode? Is our distribution Normal, Log-Normal or something else? What is the system’s steady-state, and how will I know if it deviates? Can my alerts adapt to the seasonality of my traffic? What is Holt-Winters, how come Nelson Rules and is Kolomogorov-Smirnov a type of Vodka?

The techniques covered in this talk will help you react quickly to changes in your system, and possibly prevent or minimize the impact of outages.

https://phpconference.com/session/statistics-of-web-performance-measurement-and-anomaly-detection/

3de01a0c1a9c0e55efc6ecfa72b4eab3?s=128

Philip Tellis

June 01, 2016
Tweet

Transcript

  1. Web Performance Statistics and Anomaly Detection

  2. Philip Tellis @bluesmoon http://tech.bluesmoon.info http://www.soasta.com/mpulse/

  3. 0 What do we measure?

  4. Perception & Reaction • How slow or fast did the

    page feel for the user? • How did the user react to their experience? • The user's environment that influences perceived performance
  5. boomerang https://github.com/soasta/boomerang This talk will not cover how we measure,

    it will cover what we do once we've collected measurements
  6. 0.1 Perceived performance • Measure the time from user initiating

    an action and that action completing • Measure smoothness and responsiveness • Measuring performance should not affect performance • Measure trends over time
  7. None
  8. 0.2 Reaction • How much time do users spend on

    a site? • How many actions does the user take? • Did the user bounce or convert? • What was the value of the conversion?
  9. 0.3 Environment • User Agent, Geography, Network (advertised and measured)

    • Page size, DOM nodes, scripts, css, etc. • CPU, Screen, Battery, Memory usage
  10. We measure Real User Experiences

  11. There's a lot of this and it can get noisy

  12. Management expects 1 number to rule them all or better

    yet, a colour
  13. 1 Basics:
 Distributions, Summaries & Filtering

  14. 1.1 Log-Normal distribution log() of x-axis is Normal in nature,

    this is most common
  15. 1.1 Bi-modal distribution What causes this?

  16. 1.1 In Wirklichkeit What looks like a single Log-Normal distribution

    is a sum of multiple slightly different log-normal distributions
  17. 1.1 Dimension Split

  18. 1.2 Summarization — One metric to rule them all •

    One normally uses the Arithmetic Mean:
 
 • But for Log-Normal distributions, the Geometric mean makes more sense:
  19. • But our distributions are never normal, and often not

    even log-normal • And a single number does not tell us if the distribution was multi-modal
  20. 1.2 Summarization — Midgard • A better metric that works

    for non parametric distributions is the Median, or 50th percentile defined as the middle point of an ordered dataset. • Some people like to use worse percentiles like the 75th, 95th or 98th, which gives them a better idea of the worst user experiences
  21. 1.2 Spread • Apart from central tendency, we also need

    to know the spread of our curve • The traditional metric used is the Standard Deviation • This is a bad idea:
 https://www.edge.org/response-detail/25401 • The Mean Absolute Deviation or MAD is a better measure
  22. 1.2 Another MAD • However, the (Arithmetic) Mean is used

    for Normal distributions. • And a Geometric Mean Absolute Deviation is asymmetric • So we use the Median Absolute Deviation
  23. 1.2 Median Absolute Deviation 1. Take the absolute delta of

    each point with the median:
 2. Take the median of this deviation: 3. Multiply this by 1.4826 to get the MAD
 https://en.wikipedia.org/wiki/Median_absolute_deviation
  24. Then there's the
 Inter Quartile Range IQR = Q3 -

    Q1 Also known as the Middle 50 This is used in Box Plots
  25. 1.3 Filtering All filtering is Band-Pass, the difference is in

    how the bands are selected
  26. 1.3 Filter Band Selection • Static Filtering: Based on static

    thresholds, useful to get rid of absurd, obviously fake data • IQR Filtering: Based on the IQR, determines the band based on the dataset to eliminate outliers
 
 • Dynamic Filtering: Adjust based on trends and seasonality
  27. Don't discard outliers study them for patterns

  28. 2 Advanced: Hypothesis Testing, Forecasting & Anomaly Detection

  29. 2.1 Kolmogorov-Smirnov

  30. 2.1 Kolmogorov-Smirnov • Easily compare two distributions • Good for

    1-dimensional distributions
 
 
 
 
 
 
 https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
  31. For multi-dimensional data, Kruskal-Wallis We won't go into that today

    https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance
  32. We will talk about Nelson Rules • Based on the

    idea that all natural systems have intrinsic randomness • If the data loses its randomness, something artificial is affecting it https://en.wikipedia.org/wiki/Nelson_rules
  33. 2.2 Nelson Rules • Based on Standard Deviation, which is

    not great • Requires Normal Distribution, which we don't have • We can solve the first by using MAD instead • The second problem requires us to get a little creative
  34. Converting a Random Log-Normal like distribution to Normal • For

    a Log-Normal distribution, simply taking the loge () of the random variable is sufficient to get a Normal distribution • For other cases, we find that looking at the delta between points rather than the points themselves generates a Normal distribution • The deltas form what is known as a Random Walk • We can refer to the deltas as a first order differential between points, although strictly speaking that would require a continuous random variable
  35. Differentials Random variable following Log-normal distribution Delta of Random variable

    follows a Normal distribution
  36. Nelson Rules • So by using a first order differential

    of our data, split by dimension • And using the Median Absolute Deviation • We get a reasonable idea of whether our data is insufficiently random
 
 • This is good for short time ranges
  37. What about long term forecasts?

  38. None
  39. 2.3 What is Triple Exponential Smoothing? • A simple moving

    average is the most basic form of smoothing • Exponential smoothing extends this to use exponentially decreasing weights for past terms • Double Exponential Smoothing adds a trend factor, eg: website increases in popularity over time • Triple Exponential Smoothing adds seasonality, for example, changes in traffic over the course of a day, week or year
  40. Holt-Winters Triple Exponential Smoothing • This is great for websites

    because it can be used to forecast traffic based on past trends and seasonality • For example, predict next Christmas' traffic by looking at the last 2 Christmases • But we can do better by comparing past forecasts with what actually happened, and calculating a smoothed tolerance for the forecast
  41. Dynamic Filter Bands based on H-W 3xS • Generate smoothed

    curve using H-W • Calculate deltas of past forecasts • Generate smoothed delta curve using H-W • Apply smoothed deltas to forecast to create tolerance bands • If current forecast exceeds bands, we have an anomaly, so alert on it
  42. How do we determine trend & seasonality factors? • Triple

    Exponential Smoothing requires smoothing factors for the trend and seasonality • Trend is simple, it's the slope of the curve after apply single exponential smoothing • Seasonality is harder since it's non-linear
  43. Enter Fourier Analysis Typically used in Digital Signal Processing,
 but

    in both cases we're dealing with Sine waves https://en.wikipedia.org/wiki/Fourier_analysis
  44. Early warnings for some extreme anomalies • We recently had

    a case where traffic went up 17,000% in a few minutes • What could we do to detect this and alert early enough?
  45. Anomaly Detection on First & Second Order Differentials • First

    order differential is the rate of change (velocity) of our random variable
 
 • Second order differential is the rate of change of rate of change (acceleration) of our random variable
 
 • Applying the same anomaly detection algorithms to these deltas can help us identify when we're moving too fast in the wrong direction.
  46. And when the stakeholders just want a colour Data Science

    Workbench
  47. References • Log-Normal Distribution: https://en.wikipedia.org/wiki/Log-normal_distribution • Multi-modal distribution: https://en.wikipedia.org/wiki/Multimodal_distribution •

    Marsaglia-Polar method for Gaussian random number generation:
 https://en.wikipedia.org/wiki/Marsaglia_polar_method • JavaScript gist to generate Normal & Log-Normal random distributions
 https://gist.github.com/bluesmoon/7925696 • Time to retire the Standard Deviation: https://www.edge.org/response-detail/25401 • Median Absolute Deviation: https://en.wikipedia.org/wiki/Median_absolute_deviation • Inter-quartile Range: https://en.wikipedia.org/wiki/Interquartile_range • Kolmogorov-Smirnov Test: https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test • Kruskal-Wallis Test: 
 https://en.wikipedia.org/wiki/Kruskal–Wallis_one-way_analysis_of_variance • Nelson Rules: https://en.wikipedia.org/wiki/Nelson_rules • Random Walk: https://en.wikipedia.org/wiki/Random_walk • Moving Average: https://en.wikipedia.org/wiki/Moving_average • Exponential Smoothing: https://en.wikipedia.org/wiki/Exponential_smoothing • Fourier Analysis: https://en.wikipedia.org/wiki/Fourier_analysis • SOASTA mPulse: https://www.soasta.com/performance-monitoring/ • SOASTA DSWB: https://www.soasta.com/data-science-workbench/ • boomerang: https://github.com/soasta/boomerang
  48. Further Reading • Kruskal-Wallis H Test using SPSS Statistics •

    How to Think like a Data Scientist • It Probably Works • Regression v/s Curve Fitting • Topology looks for the Patterns inside Big Data • RegTools: A Julia Package for Assisting Regression Analysis • Automating big-data analysis • Animated Math • Unsupervised Learning with Even Less Supervision Using Bayesian Optimization • The tensor renaissance in data science • Statisticians issue warning over misuse of P values • How to share data with a statistician • Statistics for Engineers: Applying statistical techniques to operations data • Calculus Learning Guide • Comparing Python Clustering Algorithms • Finding surprising patterns in a time series database in linear time and space • How to build an anomaly detection engine with Spark, Akka, and Cassandra • Statistics Done Wrong: The woefully complete guide • Circles, Sines & Signals • How to actually learn Data Science • The Risky Eclipse of Statisticians • Fundamental frequency estimation and supervised learning • Outlier Detection Gets a Makeover - Surprise Discovery in Scientific Big Data
  49. Thank You

  50. Philip Tellis @bluesmoon http://tech.bluesmoon.info http://www.soasta.com/mpulse/

  51. Image Credits • Usain Bolt: 
 https://www.flickr.com/photos/sumofmarc/7795253222/ • 100m dash:


    http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one- race-every-medalist-ever.html • Kilroy Schematic:
 http://en.wikipedia.org/wiki/File:KilroySchematic.svg • Memes from ImgFlip