This is my talk for Monitorama EU 2013. Covering how we at gutefrage.net store our Metrics in openTSDB and our experiences with it.
Also some thoughts about related topics like alerting, some creative (or crazy?) metrics.
Who am I? Senior Engineer - Data and Infrastructure at gutefrage.net GmbH Was doing software development before DevOps advocate Samstag, 21. September 13
Who is Gutefrage.net? Germany‘s biggest Q&A platform #1 German site (mobile) about 5M Unique Users #3 German site (desktop) about 17M Unique Users > 4 Mio PI/day Part of the Holtzbrinck group Running several platforms (Gutefrage.net, Helpster.de, Cosmiq, Comprano, ...) Samstag, 21. September 13
What you will get How do we store our metrics? Our experiences with that setup Why the hell are we doing that? Some thoughts on metrics Samstag, 21. September 13
The ecosystem App feeds metrics in via RabbitMQ We base Icinga checks on the metrics We evaluate etsy Skyline for anomaly detection We deploy sensors via chef Samstag, 21. September 13
What works well We store about 200M data points in several thousand time series with no issues tcollector is decoupling measurement from storage Creating new metrics is really easy Samstag, 21. September 13
Challenges The UI is seriously lacking no annotation support out of the box Only 1s time resolution (and only 1 value/s/ time series) Samstag, 21. September 13
Getting insights We move towards Continuous Deployment Complex systems show emergent behaviour Graphs are the correct flight level Samstag, 21. September 13
Alerting Fixed values for alerts are not good enough Drawing Attention vs. Alerting False positives are bugs Don‘t call the on-call-guy for nothing Samstag, 21. September 13
Metrics != boring You can (and should) get creative with what you measure. Have some brainstorming sessions Insights may come from surprising places Samstag, 21. September 13
Track ops confidence create a platform where you can buy or sell your on-call shifts. The price for a shift tells you how confident the team is. This has not been tested - yet. Samstag, 21. September 13
Track recruiting efforts Helps to get a feeling about the job market Reminds everyone to keep looking for new colleagues BTW: we are hiring ;-) Samstag, 21. September 13
Image Sources: Plane: Felix Gottwald - www.felixgottwald.net (Creative Commons Attribution Share Alike 3.0German) Talking men: Deutsche Fotothek - Peter, Richard sen. Money: Wikimedia contributor Avij Other images: Oliver Hankeln This presentation is licenced under Creative Commons Attribution Share Alike 3.0 Samstag, 21. September 13