Statistics of Web Performance Measurement and Anomaly Detection

Slide 1

Slide 1 text

Web Performance Statistics and Anomaly Detection

Slide 2

Slide 2 text

Philip Tellis @bluesmoon http://tech.bluesmoon.info http://www.soasta.com/mpulse/

Slide 3

Slide 3 text

0 What do we measure?

Slide 4

Slide 4 text

Perception & Reaction • How slow or fast did the page feel for the user? • How did the user react to their experience? • The user's environment that influences perceived performance

Slide 5

Slide 5 text

boomerang https://github.com/soasta/boomerang This talk will not cover how we measure, it will cover what we do once we've collected measurements

Slide 6

Slide 6 text

0.1 Perceived performance • Measure the time from user initiating an action and that action completing • Measure smoothness and responsiveness • Measuring performance should not affect performance • Measure trends over time

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

0.2 Reaction • How much time do users spend on a site? • How many actions does the user take? • Did the user bounce or convert? • What was the value of the conversion?

Slide 9

Slide 9 text

0.3 Environment • User Agent, Geography, Network (advertised and measured) • Page size, DOM nodes, scripts, css, etc. • CPU, Screen, Battery, Memory usage

Slide 10

Slide 10 text

We measure Real User Experiences

Slide 11

Slide 11 text

There's a lot of this and it can get noisy

Slide 12

Slide 12 text

Management expects 1 number to rule them all or better yet, a colour

Slide 13

Slide 13 text

1 Basics:  Distributions, Summaries & Filtering

Slide 14

Slide 14 text

1.1 Log-Normal distribution log() of x-axis is Normal in nature, this is most common

Slide 15

Slide 15 text

1.1 Bi-modal distribution What causes this?

Slide 16

Slide 16 text

1.1 In Wirklichkeit What looks like a single Log-Normal distribution is a sum of multiple slightly different log-normal distributions

Slide 17

Slide 17 text

1.1 Dimension Split

Slide 18

Slide 18 text

1.2 Summarization — One metric to rule them all • One normally uses the Arithmetic Mean:    • But for Log-Normal distributions, the Geometric mean makes more sense:

Slide 19

Slide 19 text

• But our distributions are never normal, and often not even log-normal • And a single number does not tell us if the distribution was multi-modal

Slide 20

Slide 20 text

1.2 Summarization — Midgard • A better metric that works for non parametric distributions is the Median, or 50th percentile defined as the middle point of an ordered dataset. • Some people like to use worse percentiles like the 75th, 95th or 98th, which gives them a better idea of the worst user experiences

Slide 21

Slide 21 text

1.2 Spread • Apart from central tendency, we also need to know the spread of our curve • The traditional metric used is the Standard Deviation • This is a bad idea:  https://www.edge.org/response-detail/25401 • The Mean Absolute Deviation or MAD is a better measure

Slide 22

Slide 22 text

1.2 Another MAD • However, the (Arithmetic) Mean is used for Normal distributions. • And a Geometric Mean Absolute Deviation is asymmetric • So we use the Median Absolute Deviation

Slide 23

Slide 23 text

1.2 Median Absolute Deviation 1. Take the absolute delta of each point with the median:  2. Take the median of this deviation: 3. Multiply this by 1.4826 to get the MAD  https://en.wikipedia.org/wiki/Median_absolute_deviation

Slide 24

Slide 24 text

Then there's the  Inter Quartile Range IQR = Q3 - Q1 Also known as the Middle 50 This is used in Box Plots

Slide 25

Slide 25 text

1.3 Filtering All filtering is Band-Pass, the difference is in how the bands are selected

Slide 26

Slide 26 text

1.3 Filter Band Selection • Static Filtering: Based on static thresholds, useful to get rid of absurd, obviously fake data • IQR Filtering: Based on the IQR, determines the band based on the dataset to eliminate outliers    • Dynamic Filtering: Adjust based on trends and seasonality

Slide 27

Slide 27 text

Don't discard outliers study them for patterns

Slide 28

Slide 28 text

2 Advanced: Hypothesis Testing, Forecasting & Anomaly Detection

Slide 29

Slide 29 text

2.1 Kolmogorov-Smirnov

Slide 30

Slide 30 text

2.1 Kolmogorov-Smirnov • Easily compare two distributions • Good for 1-dimensional distributions              https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

Slide 31

Slide 31 text

For multi-dimensional data, Kruskal-Wallis We won't go into that today https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance

Slide 32

Slide 32 text

We will talk about Nelson Rules • Based on the idea that all natural systems have intrinsic randomness • If the data loses its randomness, something artificial is affecting it https://en.wikipedia.org/wiki/Nelson_rules

Slide 33

Slide 33 text

2.2 Nelson Rules • Based on Standard Deviation, which is not great • Requires Normal Distribution, which we don't have • We can solve the first by using MAD instead • The second problem requires us to get a little creative

Slide 34

Slide 34 text

Converting a Random Log-Normal like distribution to Normal • For a Log-Normal distribution, simply taking the loge () of the random variable is sufficient to get a Normal distribution • For other cases, we find that looking at the delta between points rather than the points themselves generates a Normal distribution • The deltas form what is known as a Random Walk • We can refer to the deltas as a first order differential between points, although strictly speaking that would require a continuous random variable

Slide 35

Slide 35 text

Differentials Random variable following Log-normal distribution Delta of Random variable follows a Normal distribution

Slide 36

Slide 36 text

Nelson Rules • So by using a first order differential of our data, split by dimension • And using the Median Absolute Deviation • We get a reasonable idea of whether our data is insufficiently random    • This is good for short time ranges

Slide 37

Slide 37 text

What about long term forecasts?

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

2.3 What is Triple Exponential Smoothing? • A simple moving average is the most basic form of smoothing • Exponential smoothing extends this to use exponentially decreasing weights for past terms • Double Exponential Smoothing adds a trend factor, eg: website increases in popularity over time • Triple Exponential Smoothing adds seasonality, for example, changes in traffic over the course of a day, week or year

Slide 40

Slide 40 text

Holt-Winters Triple Exponential Smoothing • This is great for websites because it can be used to forecast traffic based on past trends and seasonality • For example, predict next Christmas' traffic by looking at the last 2 Christmases • But we can do better by comparing past forecasts with what actually happened, and calculating a smoothed tolerance for the forecast

Slide 41

Slide 41 text

Dynamic Filter Bands based on H-W 3xS • Generate smoothed curve using H-W • Calculate deltas of past forecasts • Generate smoothed delta curve using H-W • Apply smoothed deltas to forecast to create tolerance bands • If current forecast exceeds bands, we have an anomaly, so alert on it

Slide 42

Slide 42 text

How do we determine trend & seasonality factors? • Triple Exponential Smoothing requires smoothing factors for the trend and seasonality • Trend is simple, it's the slope of the curve after apply single exponential smoothing • Seasonality is harder since it's non-linear

Slide 43

Slide 43 text

Enter Fourier Analysis Typically used in Digital Signal Processing,  but in both cases we're dealing with Sine waves https://en.wikipedia.org/wiki/Fourier_analysis

Slide 44

Slide 44 text

Early warnings for some extreme anomalies • We recently had a case where traffic went up 17,000% in a few minutes • What could we do to detect this and alert early enough?

Slide 45

Slide 45 text

Anomaly Detection on First & Second Order Differentials • First order differential is the rate of change (velocity) of our random variable    • Second order differential is the rate of change of rate of change (acceleration) of our random variable    • Applying the same anomaly detection algorithms to these deltas can help us identify when we're moving too fast in the wrong direction.

Slide 46

Slide 46 text

And when the stakeholders just want a colour Data Science Workbench

Slide 47

Slide 47 text

References • Log-Normal Distribution: https://en.wikipedia.org/wiki/Log-normal_distribution • Multi-modal distribution: https://en.wikipedia.org/wiki/Multimodal_distribution • Marsaglia-Polar method for Gaussian random number generation:  https://en.wikipedia.org/wiki/Marsaglia_polar_method • JavaScript gist to generate Normal & Log-Normal random distributions  https://gist.github.com/bluesmoon/7925696 • Time to retire the Standard Deviation: https://www.edge.org/response-detail/25401 • Median Absolute Deviation: https://en.wikipedia.org/wiki/Median_absolute_deviation • Inter-quartile Range: https://en.wikipedia.org/wiki/Interquartile_range • Kolmogorov-Smirnov Test: https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test • Kruskal-Wallis Test:   https://en.wikipedia.org/wiki/Kruskal–Wallis_one-way_analysis_of_variance • Nelson Rules: https://en.wikipedia.org/wiki/Nelson_rules • Random Walk: https://en.wikipedia.org/wiki/Random_walk • Moving Average: https://en.wikipedia.org/wiki/Moving_average • Exponential Smoothing: https://en.wikipedia.org/wiki/Exponential_smoothing • Fourier Analysis: https://en.wikipedia.org/wiki/Fourier_analysis • SOASTA mPulse: https://www.soasta.com/performance-monitoring/ • SOASTA DSWB: https://www.soasta.com/data-science-workbench/ • boomerang: https://github.com/soasta/boomerang

Slide 48

Slide 48 text

Further Reading • Kruskal-Wallis H Test using SPSS Statistics • How to Think like a Data Scientist • It Probably Works • Regression v/s Curve Fitting • Topology looks for the Patterns inside Big Data • RegTools: A Julia Package for Assisting Regression Analysis • Automating big-data analysis • Animated Math • Unsupervised Learning with Even Less Supervision Using Bayesian Optimization • The tensor renaissance in data science • Statisticians issue warning over misuse of P values • How to share data with a statistician • Statistics for Engineers: Applying statistical techniques to operations data • Calculus Learning Guide • Comparing Python Clustering Algorithms • Finding surprising patterns in a time series database in linear time and space • How to build an anomaly detection engine with Spark, Akka, and Cassandra • Statistics Done Wrong: The woefully complete guide • Circles, Sines & Signals • How to actually learn Data Science • The Risky Eclipse of Statisticians • Fundamental frequency estimation and supervised learning • Outlier Detection Gets a Makeover - Surprise Discovery in Scientific Big Data

Slide 49

Slide 49 text

Thank You

Slide 50

Slide 50 text

Philip Tellis @bluesmoon http://tech.bluesmoon.info http://www.soasta.com/mpulse/

Slide 51

Slide 51 text

Image Credits • Usain Bolt:   https://www.flickr.com/photos/sumofmarc/7795253222/ • 100m dash:  http://www.nytimes.com/interactive/2012/08/05/sports/olympics/the-100-meter-dash-one- race-every-medalist-ever.html • Kilroy Schematic:  http://en.wikipedia.org/wiki/File:KilroySchematic.svg • Memes from ImgFlip