Slide 1

Slide 1 text

Real-time application monitoring Igor Afonov @iafonov

Slide 2

Slide 2 text

Background • SaaS application • 5 production servers (2 auxiliary servers) • Everything managed by chef • Apache/Passenger/Rails/MySQL/Postfix

Slide 3

Slide 3 text

Metrics that matter • Server state • Application state

Slide 4

Slide 4 text

How metrics data is stored • Round-robin database • RRDTool (C), Whisper (Python) (Minor offtopic)

Slide 5

Slide 5 text

Server state • Munin - storage and graphs • Monit - alerts, basic rescue actions Shard munin-server munin-node Shard munin-node

Slide 6

Slide 6 text

Munin • Server pulls data from nodes • Gathers basic server health data • A lot of custom plugins

Slide 7

Slide 7 text

Munin + Chef = — # Client config template munin_servers = search(:node, "role:monitoring") <% munin_servers.sort.each do |server| -%> allow ^<%= server[:ipaddress].to_s.gsub(/\./, '\.') %>$ <% end %> # Server config template munin_nodes = search(:node, "munin:[* TO *]") <% munin_nodes.each do |system| -%> [<%= system[:hostname] %>] address <%= system[:ipaddress] %> use_node_name yes <% end %>

Slide 8

Slide 8 text

Application state • Subscriptions • Logins • Orders • Business metrics • ...

Slide 9

Slide 9 text

Our setup Shard Monitoring Server StatsD Graphite Whisper Carbon WebApp Shard Shard

Slide 10

Slide 10 text

StatsD • Lightweight proxy to graphite • 300 lines of Javascript • Uses UDP (small size, non-blocking) • 10+ implementations • Simple protocol

Slide 11

Slide 11 text

StatsD • Count: counter:1|c • Measure: metric:200|ms • Gauge: value:9000|g • Supports sampling • A lot of client-side libraries

Slide 12

Slide 12 text

StatsD StatsD.server = '178.22.33.88:8125' # increment counter StatsD.increment("users.new") # measure task StatsD.measure("cron.#{task}") do task.run end # meta-programming - track subscriptions Subscription.extend StatsD::Instrument Subscription.statsd_count :subscribe, 'subscriptions'

Slide 13

Slide 13 text

Graphite • Python everywhere • Whisper - RRD, stores data • Carbon - backend • Graphite - draws graphs, works with data

Slide 14

Slide 14 text

Graphite [stats] priority = 100 pattern = ^stats\..* retentions = 10:2160,60:10080,600:262974 (Storage schemas)

Slide 15

Slide 15 text

Problems • Basic UI • Complex installation and setup

Slide 16

Slide 16 text

telemetry.io • Fun project • SaaS application • Custom front-end for StatsD + Graphite • Optimized for big screens (TVs) • Free • Maybe open-source

Slide 17

Slide 17 text

telemetry.io Client telemetry.io StatsD Graphite Whisper Carbon WebApp Custom Front-end Client Client Client

Slide 18

Slide 18 text

Workflow • Get access token • Use one of available libs or create own • Prepend token to metric name • Integrate into application StatsD.increment("#{token}.subscribers")

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Implementation • Ruby on Rails • CoffeeScript • Chef (full node bootstrap in 10 minutes)

Slide 24

Slide 24 text

Links • http://telemetry.io • http://code.flickr.com/blog/2008/10/27/ counting-timing/ • http://codeascraft.etsy.com/2011/02/15/ measure-anything-measure-everything/

Slide 25

Slide 25 text

http://iafonov.github.com/ @iafonov