an example frequency domain graph. Across the bottom we have different frequencies, starting at 0Hz and going up. The y axis a bit fuzzy, we generally just worry about comparisons instead of using speciﬁc absolute values.
this. Now we are starting to get in to the realm of the kind of graphs ops folks look at all day, complex curves without super clear patterns. This is still pretty normal looking, so lets add some more data to this.
get back from graphite. From our progression you can maybe still see those original 1hz and 4hz waves, but if you walked in one day to a load average graph that looked like this you might not recognize that. This is where the Fourier transform comes in. It can take any stream of points and show you what periodic components contributed to it and how much.
May 6, 14 So lets try it out. Here is the output from a Fourier Transform on that noisy graph. Immediately you can see the ﬁve periodic components and their relative strengths. I even threw in some random variation on each point, but on the scale of this graph its invisible. Now we can start to see why this kind of analysis, even without further processing, can be useful. How many of you would have missed that little 2Hz part in the original graph?
the discrete Fourier transform, or more often its close cousin the discrete- time Fourier transform. Rather than functions, these operate on an array of samples. If ops folks have anything it is arrays of numbers.
Tuesday, May 6, 14 Scary equation take two. This is a bit easier to wrap our brains around. We plug in each value of k from 0 to N-1 and we get our transformed data. If you are familiar with sigmas you might have noticed the downside though, this is going to be slow since for every frequency we want to analyze, we need to do a sum across every data point.
DFT, we can use a faster divide-and-conquer algorithm called a fast Fourier transform. These work by analyzing subsets of the data recursively. With an FFT, we can get order n log n instead of n squared.
some new tools, converting from time domain to frequency domain, and then back again. What are we going to do with it? Let's look at a simple ﬁlter called a low-pass. This means that it only allows signals below a certain frequency, removing the higher frequencies.
will probably look unpleasantly familiar to many of you. We have a simple alert threshold, but something is ﬂapping (to use the nagios term) so the pager is going off constantly. This sucks and generates lots of excess noise during incidents. Nagios includes a feature called ﬂap detection that tries to suppress these, but it is unpleasant to say the least.
the frequency domain. You can see that we have clearly crossed the alerting threshold, but that it is only the high frequency component that is doing so. Lets take that yellow line as a ﬁltering point and see how we can transform this data.
Fourier transform to get back to time domain. We can now see our normal background variance is doing just ﬁne, well under the alerting threshold. So now we can alert on both frequency and ﬁltered time domain so we get just the one initial alert. Or maybe we want different thresholds for short lived vs. long lived events. Once you have these tools to play with, you can use them to build all kinds of pipelines.
of a low pass ﬁlter, but you can use the concepts for changing which frequencies you alter in different ways. A high pass ﬁlter would be allowing only high frequencies, and a band pass ﬁlter is allowing only frequencies in a speciﬁc interval.
functions. A window function is a mask you can multiply the signal by to analyze just one piece at a time. In ops, this is more or less a given from the start, we can only store a ﬁnite (and often quite small) number of points so there will be some horizon before which we can't see. Unfortunately this comes at a cost, spectral leakage. This is the distortion of the computed frequency domain due to lack of information.
In fact any ﬁnite time constraint on our data can be taken as a rectangular window, though usually the noise is at such low frequencies we ignore it. Rectangular windows have the lowest levels of leakage as measured by total noise, but they are often not the best option for analyzing smaller time slices as the leakage it does create is broader.
here is the frequency domain of a 7hz sine wave with a big rectangular window. It looks exactly like we would expect, a sharp spike right at 7hz and within a rounding error of 0 everywhere else. Any noise introduced is so tiny we can't see it.
diagram. There are two important bits you need to look at. First we have the main lobe in the center. Here we can see the nice part about the rectangular window, the main lobe is very tall meaning low signal loss, and very narrow meaning low leakage between frequencies. Everything other than the main lobe are called side lobes, and here we see the problem with the rectangular windows, the side lobes are numerous and also very tall meaning that our noise frequencies aren't being suppressed.
you can see the side lobes are very suppressed, while the main lobe is much bigger than before. So this means that our main frequency gets, to use a technical term, a bit mushy, but our noise frequencies will be much less visible.
here is our same 7Hz signal with a Blackman-Harris window, and our original rectangular window as the dotted line. The peak has been lowered and it bleeds a bit more into neighboring buckets, but the buckets further away from the peak drop to zero much faster. This can help in a lot of situations when you a looking for small peaks in a multi-frequency signal, though you risk nearby frequencies getting lost in each other. This isn't usually a huge problem, but pick your window functions to match your analysis. Check out the wikipedia article on window functions for an excellent overview of tons of window functions.
go forth and use these ideas. The best one-stop-shop is the Python library NumPy. It offers not just Fourier transforms, but many many numeric analysis tools, as well as integration with other scientiﬁc computing libraries and plotting tools.
a data compression technique closely related to the Fourier transform. The overall idea is simple, convert to frequency domain, pick out the most important spectral components (usually this means discarding unhelpfully-high frequencies), and then store only those. When you want to recreate the original signal, just run an inverse DCT, which for our discussion is pretty much the same as an inverse FFT. The advantage of all of this is the transformed data takes up signiﬁcantly less space than the original data. This is literally the difference between between bitmaps and JPEGs, or wave ﬁles and MP3s.
wavelet transform in particular are similar to DCTs in application, though very different internally. They have many advantages with small, aperiodic elements and so might be more apt for compressing metrics data. They are currently at the heart of most top-of-the-line image and video compression schemes. As with DCTs, anyone looking to store large amounts of metrics data should look at these techniques.
of a single threshold have two. The alert happens when you pass the upper threshold, but only stops when you pass the lower threshold. In the example we are smoothing from four alerts ﬁring to just one.
is called hysteresis. This is a fancy word for any process where the output is related the input at the current time and the past value of the output. In the case of our noise gate, this is basically just a simple form of memory inside the algorithm.
this piqued your interest in math again, another topic to check out is control theory. Signal analysis can help you determine anomalies or ﬁnd patterns in your metrics, control theory can help you turn that around into changes to your systems. The simplest example is going from "My CPU usage is too high" to "I need to launch new servers". The trick is usually knowing how many servers to launch and when.