Abe Stanway
September 19, 2013
2.4k

# MOM! My algorithms SUCK

Given at Monitorama.eu 2013 in Berlin. http://vimeo.com/75183236

## Abe Stanway

September 19, 2013

## Transcript

4. ### that human will then alert a sleeping engineer when her

metric does something weird

6. ### this works because humans are excellent visual pattern matchers* *there

are, of course, many advanced statistical applications where signal cannot be determined from noise just by looking at the data.
7. ### can we teach software to be as good at simple

anomaly detection as humans are?

10. ### humans can tell what “normal” is by just looking at

a timeseries.
11. ### “if a datapoint is not within reasonable bounds, more or

less, of what usually happens, it’s an anomaly” the human definition:

13. None

17. ### so, in math speak, a metric is anomalous if the

absolute value of latest datapoint is over three standard deviations above the mean

19. ### pioneered in the 1920s. heavily used in industrial engineering for

quality control on assembly lines.

to work.
26. ### normal distribution: a more concise definition of good lookin’ μ

34.1% 13.6% 2.1% 34.1% 13.6% μ - σ 2.1%
27. ### if you’ve got a normal distribution, chances are you’ve got

an exchangeable, stationary series produced by independent random variables

29. ### μ 34.1% 13.6% 2.1% 34.1% 13.6% 2.1% μ - σ

if your datapoint is in here, it’s an anomaly.

32. ### ...where “signal” indicates a fundmental state change, as opposed to

a random, improbable variation.
33. ### a fundamental state change in the process means a different

probability distribution function that describes the process
34. ### determining when probability distribution function shifts have occurred, as early

as possible. anomaly detection:

melted
39. ### processes with well planned expected values that only suffer small,

random deviances when working properly...

48. ### skewed distributions! less than 99.73% of all values lie within

3σ, so breaching 3σ is not necessarily bad 3σ possibly normal range
49. ### the dirty secret: using SPC-based algorithms results in lots and

lots of false positives, and probably lots of false negatives as well
50. ### no way to retroactively find the false negatives short of

combing with human eyes!

53. ### ...after all, as long as the *errors* from the model

are normally distributed, we can use 3σ
54. ### Parameters are cool! a pretty decent forecast based on an

artisanal handcrafted model

56. ### possible to implement a class of ML algorithms that determine

models based on distribution of errors, using Q-Q plots
57. ### Q-Q plots can also be used to determine if the

PDF has changed, although hard to do with limited sample size
58. ### consenus: throw lots of different models at a series, hope

it all shakes out.

60. ### of course, if your models are all SPC-based, this doesn’t

really get you anywhere