Slide 1

Slide 1 text

Graph everything! Oliver Hankeln / gutefrage.net Samstag, 21. September 13

Slide 2

Slide 2 text

Who am I? Senior Engineer - Data and Infrastructure at gutefrage.net GmbH Was doing software development before DevOps advocate Samstag, 21. September 13

Slide 3

Slide 3 text

Who is Gutefrage.net? Germany‘s biggest Q&A platform #1 German site (mobile) about 5M Unique Users #3 German site (desktop) about 17M Unique Users > 4 Mio PI/day Part of the Holtzbrinck group Running several platforms (Gutefrage.net, Helpster.de, Cosmiq, Comprano, ...) Samstag, 21. September 13

Slide 4

Slide 4 text

Flight AB6188 Samstag, 21. September 13

Slide 5

Slide 5 text

What you will get How do we store our metrics? Our experiences with that setup Why the hell are we doing that? Some thoughts on metrics Samstag, 21. September 13

Slide 6

Slide 6 text

How we store our metrics Samstag, 21. September 13

Slide 7

Slide 7 text

Our requirements Creating new metrics has to be simple no compaction (bye bye RRDTool) System has to scale Samstag, 21. September 13

Slide 8

Slide 8 text

openTSDB Written at StumbleUpon but OpenSource Uses HBase as a storage Distributed system (multiple TSDs) Samstag, 21. September 13

Slide 9

Slide 9 text

The ecosystem App feeds metrics in via RabbitMQ We base Icinga checks on the metrics We evaluate etsy Skyline for anomaly detection We deploy sensors via chef Samstag, 21. September 13

Slide 10

Slide 10 text

Our experiences Samstag, 21. September 13

Slide 11

Slide 11 text

What works well We store about 200M data points in several thousand time series with no issues tcollector is decoupling measurement from storage Creating new metrics is really easy Samstag, 21. September 13

Slide 12

Slide 12 text

Challenges The UI is seriously lacking no annotation support out of the box Only 1s time resolution (and only 1 value/s/ time series) Samstag, 21. September 13

Slide 13

Slide 13 text

salvation is coming OpenTSDB 2 is around the corner millisecond precision annotations and meta data decent API Samstag, 21. September 13

Slide 14

Slide 14 text

Why the hell are we doing this? Samstag, 21. September 13

Slide 15

Slide 15 text

Communication Replace gut feeling with real data Helps to avoid the blame game Brains prefer graphs to numbers Samstag, 21. September 13

Slide 16

Slide 16 text

Getting insights We move towards Continuous Deployment Complex systems show emergent behaviour Graphs are the correct flight level Samstag, 21. September 13

Slide 17

Slide 17 text

Lean Startup Build - Measure - Learn cycle You have to define measureable goals No. It‘s measure not guessing Samstag, 21. September 13

Slide 18

Slide 18 text

Perspectives Operations (Server load, traffic, disk space,...) Developers (DB Queries/PageView, JS errors,...) Product Owners (Content creation, Content Quality, ...) ... Samstag, 21. September 13

Slide 19

Slide 19 text

Some random thoughts Samstag, 21. September 13

Slide 20

Slide 20 text

Public display Helps that everyone feels involved n+1 eyes see more than n eyes Needs a culture of trust Samstag, 21. September 13

Slide 21

Slide 21 text

Alerting Fixed values for alerts are not good enough Drawing Attention vs. Alerting False positives are bugs Don‘t call the on-call-guy for nothing Samstag, 21. September 13

Slide 22

Slide 22 text

Metrics != boring You can (and should) get creative with what you measure. Have some brainstorming sessions Insights may come from surprising places Samstag, 21. September 13

Slide 23

Slide 23 text

Track team happiness There is no fixed scale It forces you to communicate If you listen you can find problems in the team Samstag, 21. September 13

Slide 24

Slide 24 text

Track ops confidence create a platform where you can buy or sell your on-call shifts. The price for a shift tells you how confident the team is. This has not been tested - yet. Samstag, 21. September 13

Slide 25

Slide 25 text

Track recruiting efforts Helps to get a feeling about the job market Reminds everyone to keep looking for new colleagues BTW: we are hiring ;-) Samstag, 21. September 13

Slide 26

Slide 26 text

Questions? Please contact me: [email protected] @mydalon I‘ll upload the slides and tweet about it Samstag, 21. September 13

Slide 27

Slide 27 text

one more thing Samstag, 21. September 13

Slide 28

Slide 28 text

Please give feedback! [email protected] @mydalon Samstag, 21. September 13

Slide 29

Slide 29 text

Image Sources: Plane: Felix Gottwald - www.felixgottwald.net (Creative Commons Attribution Share Alike 3.0German) Talking men: Deutsche Fotothek - Peter, Richard sen. Money: Wikimedia contributor Avij Other images: Oliver Hankeln This presentation is licenced under Creative Commons Attribution Share Alike 3.0 Samstag, 21. September 13