Graph everything!
Oliver Hankeln / gutefrage.net
Samstag, 21. September 13
Slide 2
Slide 2 text
Who am I?
Senior Engineer - Data and Infrastructure at
gutefrage.net GmbH
Was doing software development before
DevOps advocate
Samstag, 21. September 13
Slide 3
Slide 3 text
Who is Gutefrage.net?
Germany‘s biggest Q&A platform
#1 German site (mobile) about 5M Unique Users
#3 German site (desktop) about 17M Unique Users
> 4 Mio PI/day
Part of the Holtzbrinck group
Running several platforms (Gutefrage.net,
Helpster.de, Cosmiq, Comprano, ...)
Samstag, 21. September 13
Slide 4
Slide 4 text
Flight AB6188
Samstag, 21. September 13
Slide 5
Slide 5 text
What you will get
How do we store our metrics?
Our experiences with that setup
Why the hell are we doing that?
Some thoughts on metrics
Samstag, 21. September 13
Slide 6
Slide 6 text
How we store our
metrics
Samstag, 21. September 13
Slide 7
Slide 7 text
Our requirements
Creating new metrics has to be simple
no compaction (bye bye RRDTool)
System has to scale
Samstag, 21. September 13
Slide 8
Slide 8 text
openTSDB
Written at StumbleUpon but OpenSource
Uses HBase as a storage
Distributed system (multiple TSDs)
Samstag, 21. September 13
Slide 9
Slide 9 text
The ecosystem
App feeds metrics in via RabbitMQ
We base Icinga checks on the metrics
We evaluate etsy Skyline for anomaly
detection
We deploy sensors via chef
Samstag, 21. September 13
Slide 10
Slide 10 text
Our experiences
Samstag, 21. September 13
Slide 11
Slide 11 text
What works well
We store about 200M data points in several
thousand time series with no issues
tcollector is decoupling measurement from
storage
Creating new metrics is really easy
Samstag, 21. September 13
Slide 12
Slide 12 text
Challenges
The UI is seriously lacking
no annotation support out of the box
Only 1s time resolution (and only 1 value/s/
time series)
Samstag, 21. September 13
Slide 13
Slide 13 text
salvation is coming
OpenTSDB 2 is around the corner
millisecond precision
annotations and meta data
decent API
Samstag, 21. September 13
Slide 14
Slide 14 text
Why the hell are we
doing this?
Samstag, 21. September 13
Slide 15
Slide 15 text
Communication
Replace gut feeling
with real data
Helps to avoid the
blame game
Brains prefer graphs
to numbers
Samstag, 21. September 13
Slide 16
Slide 16 text
Getting insights
We move towards Continuous Deployment
Complex systems show emergent behaviour
Graphs are the correct flight level
Samstag, 21. September 13
Slide 17
Slide 17 text
Lean Startup
Build - Measure - Learn cycle
You have to define measureable goals
No. It‘s measure not guessing
Samstag, 21. September 13
Public display
Helps that everyone
feels involved
n+1 eyes see more
than n eyes
Needs a culture of
trust
Samstag, 21. September 13
Slide 21
Slide 21 text
Alerting
Fixed values for alerts are not good enough
Drawing Attention vs. Alerting
False positives are bugs
Don‘t call the on-call-guy for nothing
Samstag, 21. September 13
Slide 22
Slide 22 text
Metrics != boring
You can (and should) get creative with what
you measure.
Have some brainstorming sessions
Insights may come from surprising places
Samstag, 21. September 13
Slide 23
Slide 23 text
Track team happiness
There is no fixed
scale
It forces you to
communicate
If you listen you can
find problems in the
team
Samstag, 21. September 13
Slide 24
Slide 24 text
Track ops confidence
create a platform
where you can buy or
sell your on-call
shifts.
The price for a shift
tells you how
confident the team is.
This has not been
tested - yet.
Samstag, 21. September 13
Slide 25
Slide 25 text
Track recruiting efforts
Helps to get a feeling
about the job market
Reminds everyone to
keep looking for new
colleagues
BTW: we are
hiring ;-)
Samstag, 21. September 13
Slide 26
Slide 26 text
Questions?
Please contact me:
[email protected]
@mydalon
I‘ll upload the slides and tweet about it
Samstag, 21. September 13
Slide 27
Slide 27 text
one more thing
Samstag, 21. September 13
Slide 28
Slide 28 text
Please give feedback!
[email protected]
@mydalon
Samstag, 21. September 13
Slide 29
Slide 29 text
Image Sources:
Plane: Felix Gottwald - www.felixgottwald.net (Creative Commons Attribution Share
Alike 3.0German)
Talking men: Deutsche Fotothek - Peter, Richard sen.
Money: Wikimedia contributor Avij
Other images: Oliver Hankeln
This presentation is licenced under Creative Commons
Attribution Share Alike 3.0
Samstag, 21. September 13