Is There An Echo In Here? Signal Analysis for Ops (with notes)

Signal Analysis for Ops Is There An Echo In Here?
Noah Kantrowitz Tuesday, May 6, 14

MATH AHEAD Tuesday, May 6, 14 Upfront warning: this talk
is about math. It will introduce you to the basic concepts and techniques in signal processing in the hopes this will give you tools to apply to your operations life.

This is a Metric Tuesday, May 6, 14 With that
out of the way, let's start the beginning. Here is a single metric.

Value @ Time Tuesday, May 6, 14 It has a
value and a timestamp. Some metrics may have more metadata than that, but a value at a time is pretty much the minimum you can have.

Tuesday, May 6, 14 Very often we put metrics in
graphs, again we have the value on the y axis and the time on the x axis.

metric.wav Tuesday, May 6, 14 So to cut to the
chase a bit, this is exactly the same structure as audio data, and many similar signals.

Tuesday, May 6, 14 Many of you have probably seen
an image like this at some point.

Tuesday, May 6, 14 Or possibly this. But what are
these graphs coming from? They don't look like our metric graphs, but we said audio data and metrics are basically the same thing.

Frequency Domain Tuesday, May 6, 14 So before we had
a graph of value over time, this is what is called "time domain". Those audio visualizers are different, instead they show value over frequency. This is the frequency domain.

Frequency 0hz 20Hz Tuesday, May 6, 14 So here is
an example frequency domain graph. Across the bottom we have different frequencies, starting at 0Hz and going up. The y axis a bit fuzzy, we generally just worry about comparisons instead of using speciﬁc absolute values.

Value +0dB +50dB Tuesday, May 6, 14 You might sometimes
see the y axis measured decibels. Decibels are just a compact way to represent a ratio between two values. +10dB is the same things as saying times 10, and -20dB is the same as saying divided by 100.

Fourier Transform Tuesday, May 6, 14 But we don't have
frequency domain information, we have time domain metrics. The Fourier transform is how you convert from time to frequency domain.

ˆ f ( ⇠ ) = Z 1 1 f
( x ) e 2 ⇡ix⇠ dx Tuesday, May 6, 14 Here is the formal deﬁnition, which probably means about as much to you as it does to me, so lets go through it piece by piece

Tuesday, May 6, 14 So a super simple graph, a
sine wave at 1hz.

Tuesday, May 6, 14 To that lets add another sine
wave, this time at 4hz and with half the magnitude. This means our second graph is half the height and changing four times faster.

Tuesday, May 6, 14 Add them together and we get
this. Now we are starting to get in to the realm of the kind of graphs ops folks look at all day, complex curves without super clear patterns. This is still pretty normal looking, so lets add some more data to this.

Tuesday, May 6, 14 Okay, thats a graph I could
get back from graphite. From our progression you can maybe still see those original 1hz and 4hz waves, but if you walked in one day to a load average graph that looked like this you might not recognize that. This is where the Fourier transform comes in. It can take any stream of points and show you what periodic components contributed to it and how much.

1Hz 4 Hz/2 10 Hz/4 16 Hz/4 2 Hz/10 Tuesday,
May 6, 14 So lets try it out. Here is the output from a Fourier Transform on that noisy graph. Immediately you can see the ﬁve periodic components and their relative strengths. I even threw in some random variation on each point, but on the scale of this graph its invisible. Now we can start to see why this kind of analysis, even without further processing, can be useful. How many of you would have missed that little 2Hz part in the original graph?

Tuesday, May 6, 14 For the more visual in the
audience, here you can see the all ﬁve sine waves as split out against the Fourier transform.

ˆ f ( ⇠ ) = Z 1 1 f
( x ) e 2 ⇡ix⇠ dx Tuesday, May 6, 14 So now we know what a Fourier transform is. What about that scary looking equation? Those inﬁnities are not usually a good sign when trying to write code.

DFT DTFT Tuesday, May 6, 14 So instead we use
the discrete Fourier transform, or more often its close cousin the discrete- time Fourier transform. Rather than functions, these operate on an array of samples. If ops folks have anything it is arrays of numbers.

Xk = N 1 X n=0 xne 2⇡i N nk
Tuesday, May 6, 14 Scary equation take two. This is a bit easier to wrap our brains around. We plug in each value of k from 0 to N-1 and we get our transformed data. If you are familiar with sigmas you might have noticed the downside though, this is going to be slow since for every frequency we want to analyze, we need to do a sum across every data point.

FFT Tuesday, May 6, 14 So instead of the plain
DFT, we can use a faster divide-and-conquer algorithm called a fast Fourier transform. These work by analyzing subsets of the data recursively. With an FFT, we can get order n log n instead of n squared.

IFT Tuesday, May 6, 14 And one last acronym for
you, inverse Fourier transform. This lets us reverse the process, going from the frequency domain back to time domain but adding up all the sine waves.

Low-pass Tuesday, May 6, 14 So okay, now we have
some new tools, converting from time domain to frequency domain, and then back again. What are we going to do with it? Let's look at a simple ﬁlter called a low-pass. This means that it only allows signals below a certain frequency, removing the higher frequencies.

Tuesday, May 6, 14 So lets take another graph that
will probably look unpleasantly familiar to many of you. We have a simple alert threshold, but something is ﬂapping (to use the nagios term) so the pager is going off constantly. This sucks and generates lots of excess noise during incidents. Nagios includes a feature called ﬂap detection that tries to suppress these, but it is unpleasant to say the least.

Tuesday, May 6, 14 Here is the same data in
the frequency domain. You can see that we have clearly crossed the alerting threshold, but that it is only the high frequency component that is doing so. Lets take that yellow line as a ﬁltering point and see how we can transform this data.

Tuesday, May 6, 14 First we discard everything over our
ﬁltering point.

Tuesday, May 6, 14 And then we use an inverse
Fourier transform to get back to time domain. We can now see our normal background variance is doing just ﬁne, well under the alerting threshold. So now we can alert on both frequency and ﬁltered time domain so we get just the one initial alert. Or maybe we want different thresholds for short lived vs. long lived events. Once you have these tools to play with, you can use them to build all kinds of pipelines.

High-pass Band-pass Tuesday, May 6, 14 So thats the basics
of a low pass filter, but you can use the concepts for changing which frequencies you alter in different ways. A high pass filter would be allowing only high frequencies, and a band pass filter is allowing only frequencies in a specific interval.

Windowing Tuesday, May 6, 14 Next signal analysis concept, window
functions. A window function is a mask you can multiply the signal by to analyze just one piece at a time. In ops, this is more or less a given from the start, we can only store a ﬁnite (and often quite small) number of points so there will be some horizon before which we can't see. Unfortunately this comes at a cost, spectral leakage. This is the distortion of the computed frequency domain due to lack of information.

Tuesday, May 6, 14 The simplest window is a rectangle.
In fact any ﬁnite time constraint on our data can be taken as a rectangular window, though usually the noise is at such low frequencies we ignore it. Rectangular windows have the lowest levels of leakage as measured by total noise, but they are often not the best option for analyzing smaller time slices as the leakage it does create is broader.

Tuesday, May 6, 14 To make this a bit clearer,
here is the frequency domain of a 7hz sine wave with a big rectangular window. It looks exactly like we would expect, a sharp spike right at 7hz and within a rounding error of 0 everywhere else. Any noise introduced is so tiny we can't see it.

Tuesday, May 6, 14 Here is the same wave with
a three quarters of a second window instead. Now instead of a sharp peak we get a gradual taper, some of our 7hz signal has leaked in to adjacent buckets in the Fourier transform.

Tuesday, May 6, 14 So here is a spectral leakage
diagram. There are two important bits you need to look at. First we have the main lobe in the center. Here we can see the nice part about the rectangular window, the main lobe is very tall meaning low signal loss, and very narrow meaning low leakage between frequencies. Everything other than the main lobe are called side lobes, and here we see the problem with the rectangular windows, the side lobes are numerous and also very tall meaning that our noise frequencies aren't being suppressed.

Tuesday, May 6, 14 Another simple option is a triangular
window. It is better than a rectangle, but not by a whole lot.

Tuesday, May 6, 14 So to compare to the last
leakage graph, we can see out main lob is bigger but not a ton, and our side lobes are reduced, but again not by much.

Tuesday, May 6, 14 A pretty common general-purpose window function
is the Blackman-Harris curve. This is a 4-term Blackman-Harris, which is a balance between the main lob and side lobes

Tuesday, May 6, 14 As compared to the previous two,
you can see the side lobes are very suppressed, while the main lobe is much bigger than before. So this means that our main frequency gets, to use a technical term, a bit mushy, but our noise frequencies will be much less visible.

Tuesday, May 6, 14 To make this a bit clearer,
here is our same 7Hz signal with a Blackman-Harris window, and our original rectangular window as the dotted line. The peak has been lowered and it bleeds a bit more into neighboring buckets, but the buckets further away from the peak drop to zero much faster. This can help in a lot of situations when you a looking for small peaks in a multi-frequency signal, though you risk nearby frequencies getting lost in each other. This isn't usually a huge problem, but pick your window functions to match your analysis. Check out the wikipedia article on window functions for an excellent overview of tons of window functions.

NumPy Tuesday, May 6, 14 So now you want to
go forth and use these ideas. The best one-stop-shop is the Python library NumPy. It offers not just Fourier transforms, but many many numeric analysis tools, as well as integration with other scientiﬁc computing libraries and plotting tools.

FFTW Tuesday, May 6, 14 FFTW is also an option,
and has bindings in most languages, though of highly varying quality levels.

go-dsp Tuesday, May 6, 14 go-dsp is another relatively minimalist
option for Go.

Ruby Tuesday, May 6, 14 Unfortunately Ruby, the workhorse of
many Ops teams, has no good tooling for scientiﬁc or mathematical computing, so you are mostly on your own there.

Go forth and ﬁnd the signal Tuesday, May 6, 14

Thank You Tuesday, May 6, 14

Bonus Round Tuesday, May 6, 14 Some extra content if
there is time.

DCT Tuesday, May 6, 14 The discrete cosine transform is
a data compression technique closely related to the Fourier transform. The overall idea is simple, convert to frequency domain, pick out the most important spectral components (usually this means discarding unhelpfully-high frequencies), and then store only those. When you want to recreate the original signal, just run an inverse DCT, which for our discussion is pretty much the same as an inverse FFT. The advantage of all of this is the transformed data takes up signiﬁcantly less space than the original data. This is literally the difference between between bitmaps and JPEGs, or wave ﬁles and MP3s.

Wavelets Tuesday, May 6, 14 Wavelet transforms and the discrete
wavelet transform in particular are similar to DCTs in application, though very different internally. They have many advantages with small, aperiodic elements and so might be more apt for compressing metrics data. They are currently at the heart of most top-of-the-line image and video compression schemes. As with DCTs, anyone looking to store large amounts of metrics data should look at these techniques.

Noise Gate Tuesday, May 6, 14 In a totally different
direction, lets look a noise gates. These are commonly used in audio processing to reduce noise and crosstalk in a signal.

Tuesday, May 6, 14 The idea is pretty simple, instead
of a single threshold have two. The alert happens when you pass the upper threshold, but only stops when you pass the lower threshold. In the example we are smoothing from four alerts ﬁring to just one.

Hysteresis Tuesday, May 6, 14 This general category of techniques
is called hysteresis. This is a fancy word for any process where the output is related the input at the current time and the past value of the output. In the case of our noise gate, this is basically just a simple form of memory inside the algorithm.

Control Theory Tuesday, May 6, 14 And ﬁnally, if all
this piqued your interest in math again, another topic to check out is control theory. Signal analysis can help you determine anomalies or ﬁnd patterns in your metrics, control theory can help you turn that around into changes to your systems. The simplest example is going from "My CPU usage is too high" to "I need to launch new servers". The trick is usually knowing how many servers to launch and when.

PID Control Tuesday, May 6, 14 And if you only
learn one control algorithm, check out the PID loop controller. Or maybe I'll just have to come back next year and do a talk on it.

Thank You Tuesday, May 6, 14

Is There An Echo In Here? Signal Analysis for O...

Is There An Echo In Here? Signal Analysis for Ops (with notes)

More Decks by Noah Kantrowitz

Other Decks in Technology

Featured

Transcript