Slide 1

Slide 1 text

Introduction of Skyline @takus monitoring casual #4 2013.07.12

Slide 2

Slide 2 text

Velocity 2013 • Web Performance & Operations Conference • Great talks • Great people • Great culture

Slide 3

Slide 3 text

My LT Talk at Velocity • 3 Popular Ops Tools in Japan • serverspec, growthforecast, fluentd • Uploaded on youtube :-( • http://www.youtube.com/watch? v=bRYuBQyG5Sw

Slide 4

Slide 4 text

Some Positive Feedbacks

Slide 5

Slide 5 text

The Most Interesting Talk • Avoiding Performance Regression at Twitter • fight against perf regression in an automated fashion • http://ameblo.jp/principia-ca/ entry-11561132297.html

Slide 6

Slide 6 text

Today’s Topic

Slide 7

Slide 7 text

Background • Etsy deploys their app 30+ times per day • Optimize for quick recovery by anticipating problems, instead of fearing human error • Can’t fix what you don’t measure! • If it moves, graph it

Slide 8

Slide 8 text

Too Many Graphs • 250,000+ dashboards • If a graph spikes and no one is watching, does it really spike? • There are things we do not know we don’t know.

Slide 9

Slide 9 text

Kale • Skyline • Detect unknown anomalies • Oculus • Detect unknown correlations http://codeascraft.com/2013/06/11/introducing-kale/

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Architecture Horizon Horizon Horizon Redis Analyzer Manager Analyzer Analyzer Analyzer Assign Metrics Keys Get Metrics Keys Fetch Timeseries Metrics

Slide 12

Slide 12 text

Horizon • Listeners • Receiving metrics & store them to queue • Workers • Inserting metrics to redis with Messagepack • Roombas • Purging metrics in Redis at a regular interval

Slide 13

Slide 13 text

Analyzer • Assigning Redis keys to each process process • decode from Messagepack • run the detection algorithm

Slide 14

Slide 14 text

How to Detect Anomalies? • Consensus model • If the majority of algorithms agree, the metric will be classified as anomalous • Use your own algorithm for each application • as long as you return a boolean, you can add any sort of algorithm you like

Slide 15

Slide 15 text

Basic Algorithm • A metric is anomalous if its latest datapoint is over three standard deviations above its moving average

Slide 16

Slide 16 text

Anomaly?

Slide 17

Slide 17 text

Conclusion • Etsy monitors 250,000+ graphs • If a graph spikes and no one is watching, does it really spike? • Skyline • detects unknown anomalies • consensus with any algorithms you like