Save 37% off PRO during our Black Friday Sale! »

Berlin 2013 - Session - Oliver Hankeln

0580d500edfdb2e5e80e4732ac8df1ea?s=47 Monitorama
September 20, 2013
440

Berlin 2013 - Session - Oliver Hankeln

0580d500edfdb2e5e80e4732ac8df1ea?s=128

Monitorama

September 20, 2013
Tweet

Transcript

  1. Graph everything! Oliver Hankeln / gutefrage.net Samstag, 21. September 13

  2. Who am I? Senior Engineer - Data and Infrastructure at

    gutefrage.net GmbH Was doing software development before DevOps advocate Samstag, 21. September 13
  3. Who is Gutefrage.net? Germany‘s biggest Q&A platform #1 German site

    (mobile) about 5M Unique Users #3 German site (desktop) about 17M Unique Users > 4 Mio PI/day Part of the Holtzbrinck group Running several platforms (Gutefrage.net, Helpster.de, Cosmiq, Comprano, ...) Samstag, 21. September 13
  4. Flight AB6188 Samstag, 21. September 13

  5. What you will get How do we store our metrics?

    Our experiences with that setup Why the hell are we doing that? Some thoughts on metrics Samstag, 21. September 13
  6. How we store our metrics Samstag, 21. September 13

  7. Our requirements Creating new metrics has to be simple no

    compaction (bye bye RRDTool) System has to scale Samstag, 21. September 13
  8. openTSDB Written at StumbleUpon but OpenSource Uses HBase as a

    storage Distributed system (multiple TSDs) Samstag, 21. September 13
  9. The ecosystem App feeds metrics in via RabbitMQ We base

    Icinga checks on the metrics We evaluate etsy Skyline for anomaly detection We deploy sensors via chef Samstag, 21. September 13
  10. Our experiences Samstag, 21. September 13

  11. What works well We store about 200M data points in

    several thousand time series with no issues tcollector is decoupling measurement from storage Creating new metrics is really easy Samstag, 21. September 13
  12. Challenges The UI is seriously lacking no annotation support out

    of the box Only 1s time resolution (and only 1 value/s/ time series) Samstag, 21. September 13
  13. salvation is coming OpenTSDB 2 is around the corner millisecond

    precision annotations and meta data decent API Samstag, 21. September 13
  14. Why the hell are we doing this? Samstag, 21. September

    13
  15. Communication Replace gut feeling with real data Helps to avoid

    the blame game Brains prefer graphs to numbers Samstag, 21. September 13
  16. Getting insights We move towards Continuous Deployment Complex systems show

    emergent behaviour Graphs are the correct flight level Samstag, 21. September 13
  17. Lean Startup Build - Measure - Learn cycle You have

    to define measureable goals No. It‘s measure not guessing Samstag, 21. September 13
  18. Perspectives Operations (Server load, traffic, disk space,...) Developers (DB Queries/PageView,

    JS errors,...) Product Owners (Content creation, Content Quality, ...) ... Samstag, 21. September 13
  19. Some random thoughts Samstag, 21. September 13

  20. Public display Helps that everyone feels involved n+1 eyes see

    more than n eyes Needs a culture of trust Samstag, 21. September 13
  21. Alerting Fixed values for alerts are not good enough Drawing

    Attention vs. Alerting False positives are bugs Don‘t call the on-call-guy for nothing Samstag, 21. September 13
  22. Metrics != boring You can (and should) get creative with

    what you measure. Have some brainstorming sessions Insights may come from surprising places Samstag, 21. September 13
  23. Track team happiness There is no fixed scale It forces

    you to communicate If you listen you can find problems in the team Samstag, 21. September 13
  24. Track ops confidence create a platform where you can buy

    or sell your on-call shifts. The price for a shift tells you how confident the team is. This has not been tested - yet. Samstag, 21. September 13
  25. Track recruiting efforts Helps to get a feeling about the

    job market Reminds everyone to keep looking for new colleagues BTW: we are hiring ;-) Samstag, 21. September 13
  26. Questions? Please contact me: oliver.hankeln@gutefrage.net @mydalon I‘ll upload the slides

    and tweet about it Samstag, 21. September 13
  27. one more thing Samstag, 21. September 13

  28. Please give feedback! oliver.hankeln@gutefrage.net @mydalon Samstag, 21. September 13

  29. Image Sources: Plane: Felix Gottwald - www.felixgottwald.net (Creative Commons Attribution

    Share Alike 3.0German) Talking men: Deutsche Fotothek - Peter, Richard sen. Money: Wikimedia contributor Avij Other images: Oliver Hankeln This presentation is licenced under Creative Commons Attribution Share Alike 3.0 Samstag, 21. September 13