Gather metrics with statsd and graphite

Gather metrics with statsd and graphite

Application metrics are extremely important but are often hard to gather as our PHP Applications differ significantly. Using StatsD and Graphite we can gather metrics from our applications no matter what their shape or form. In this talk I will discuss how you can use Statsd to send various metrics of your PHP applications to Graphite. StatsD is a simple NodeJS daemon for easy stats aggregation and makes it simple to plot application metrics on a graph in Graphite. Using the metrics that are gathered its possible to get an overview of what is happening with our applications in near realtime which is extremely useful. Graphite additionally allows us to produce easy understandable graphs and dashboards which once analysed can be used to improve our PHP applications. My talk will cover everything from setting up Statsd and Graphite to how you gather the metrics from within your PHP applications. After the talk developers should be confident enough to go away and implement these technologies in their applications.

A1de24f8a9fe6742162f298032b56922?s=128

Jeremy Quinton

February 21, 2014
Tweet

Transcript

  1. Gathering Metrics with StatsD and Graphite http://www.flickr.com/photos/wwarby/ Jeremy Quinton -

    PHP UK 2014 @jeremyquinton
  2. None
  3. • Working with PHP/Lamp Stack since 2003. • Open Source

    Enthusiast. • Big Fan of Devops cultural and professional 
 movement. About
  4. Devops • Culture • Automation • Measurement • Sharing

  5. Web Application Go Live to production Majority of our traffic

    in a 3 month period with big spike at the end Load Testing Multiple deployments with frequent changes Building the app series of sprints - Lead traffic No Application metrics
  6. You can’t optimise what you can’t measure - Juozas Kaziukenas

  7. Flickr - http://code.flickr.net/2008/10/27/counting- timing/ ! ! ! Etsy - http://codeascraft.com/2011/02/15/measure-

    anything-measure-everything/ Blogs posts
  8. Measuring at three levels • Network • Machine • Application

    http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
  9. Why measurement is important? • Helps us to understand things.

    • Once we understand things we can improve them.
  10. ! “Measurement is the first step that leads to control

    and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” ! ― H. James Harrington
  11. "A system or standard of measurement" Metric

  12. System Overview Application StatsD Graphite Carbon Whisper Wep app

  13. Types of Application Metrics • Counting How many emails did

    we send over the last 10 minutes? How many users logging into the system in the last 5 minutes? How many calls have we made to a particular api?
  14. https://packagist.org/search/?q=statsd PHP statsD client library

  15. Over the wire accounts.authentication.login.attempted:1|c ! <?php! $connection = new \Domnikl\Statsd\Connection\Socket('localhost',

    8125);
 $statsd = new \Domnikl\Statsd\Client($connection);! Example Counting Metric Metric name Count value Metric type echo “accounts.authentication.login.attempted:1|c” | nc -w0 -u 127.0.0.1 // simple counts
 $statsd->increment(“accounts.authentication.login.attempted”);
  16. StatsD • Built By Etsy. • Network Daemon gathers and

    flushes metrics to a back-end. • Sends them to a back-end at a specific interval. Default 10s. • Uses UDP fire and forget.
  17. None
  18. StatsD • git clone https://github.com/etsy/statsd.git • Create a config file

    from exampleConfig.js and put it somewhere. • Start the Daemon: node stats.js /path/to/config
  19. StatsD { port: 8125, backends: ['./backends/graphite'] graphitePort: 2003, graphiteHost: “127.0.0.1",

    debug: true, legacyNamespace: false } Sample config
  20. StatsD flush interval default every 10s Half a System Application

    <?php 
 … $statsd->increment(“accounts.authentication.login.attempted”); stats.counters.accounts.authentication.login.attempted.count stats.counters.accounts.authentication.login.attempted.rate Added by StatsD accounts.authentication.login.attempted:1|c 23 2.3
  21. Namespacing Metrics https://github.com/etsy/statsd/blob/master/docs/namespacing.md http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/ <namespace>.<instrumented section>.<target (noun)>.<action (past tense verb)>

    accounts.authentication.login.attempted
 accounts.authentication.login.succeeded
 accounts.authentication.login.failed <product>.<environment>.<subcomponet>.<metric> emailvision.production.api.sendEmailResponseTime
  22. Graphite • Store numeric time-series data. • Render graphs of

    this data on demand. • Powerful function library.
  23. Carbon Configured with /opt/graphite/conf/carbon.conf Whisper Configured with /opt/graphite/conf/storage-schemas.conf Configured with

    /opt/graphite/conf/storage-aggregation.conf Graphite
  24. Whisper • Fixed-size database, similar in design to Round Robin

    Database. archive n archive 2 metadata archive 1 1392589140 25 1392589150 17 1392589160 34 1392589170 68 Example time series data https://github.com/graphite-project/graphite-web/blob/master/docs/whisper.rst
  25. storage-schemas.conf Whisper http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf [stats] pattern = ^stats.* stats.counters.accounts.authentication.login.attempted.count metadata archive

    1 10s:6h Frequency History retentions = 10s:6h 1m:7d archive 2 ,1m:7d, archive n 10m:5y 10m:5y • Lowest retention must be the same as the default flush interval for StatsD.
 • Whenever metrics are old enough to leave an archive they get aggregated.
  26. Whisper storage-aggregation.conf http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf https://github.com/etsy/statsd/blob/master/docs/graphite.md [count] pattern = \.count$ aggregationMethod =

    sum xFilesFactor = 0 • By default whisper uses average so for counter metrics we want the
 aggregation method to be sum. • Aggregation methods include average,sum,min,max
  27. Application StatsD flush interval default every 10s Graphite Carbon Whisper

    Wep app carbon.conf storage-schema.conf Full System <?php 
 … $statsd->increment(“accounts.authentication.login.attempted”); storage-aggregation.conf accounts.authentication.login.attempted:1|c
  28. <?php ! public function authenticate(…) { …….. $statsd->increment(“accounts.authentication.login.attempted”); ! if(password_verify($password,

    $hash) { …….. $statsd->increment(“accounts.authentication.login.success”); ! } else { ……. $statsd->increment(“accounts.authentication.login.failed”) } } Counting Metrics continued
  29. Graphite stats.counters.accounts.authentication.password.attempted.count stats.counters.accounts.authentication.password.failed.count stats.counters.accounts.authentication.password.succeeded.count Data buckets

  30. Graphite

  31. Graphite

  32. Track every release The trick to displaying events in Graphite

    is to apply the drawAsInfinite() function. $statsd->increment(“deploys”); echo “deploys:1|c” | nc -w0 -u 127.0.0.1
  33. Other Metric Types • Timing • Gauges • Sets https://github.com/etsy/statsd/blob/master/docs/metric_types.md

  34. Timing $statsd->startTiming(“emailvision.production.api.sendEmailResponseTime"); ! // code which connects and sends email

    ! $statsd->endTiming("emailvision.production.api.sendEmailResponseTime"); $statsd->timing("emailvision.production.api.sendEmailResponseTime", 320); emailvision.production.api.sendEmailResponseTime:320|ms
  35. Timing metrics under the hood Mean - 209 = sum(values)/number

    of values Lower - 200 Upper - 217 • StatsD does a lot of aggregation for us. Set of data for a 10s period - 200,210,208,207,212,217 • One timing metric produces 9 data buckets. stats.timers.emailvision.production.api.sendEmailResponseTime.lower stats.timers.emailvision.production.api.sendEmailResponseTime.mean stats.timers.emailvision.production.api.sendEmailResponseTime.upper stats.timers.emailvision.production.api.sendEmailResponseTime.mean_90 stats.timers.emailvision.production.api.sendEmailResponseTime.median stats.timers.emailvision.production.api.sendEmailResponseTime.std stats.timers.emailvision.production.api.sendEmailResponseTime.sum stats.timers.emailvision.production.api.sendEmailResponseTime.sum_90 stats.timers.emailvision.production.api.sendEmailResponseTime.upper_90
  36. Example timing metrics in Graphite

  37. Example timing metrics in Graphite What happened?

  38. http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.authentication.password.succee ded.count&title=Succesful%20logins&format=json Graphite

  39. Anomaly Detection http://codeascraft.com/2013/06/11/introducing-kale/ • Skyline developed by Etsy - https://github.com/etsy/skyline

    • Simple Nagios checks - https://github.com/pyr/check-graphite
  40. Metric From your logs https://github.com/etsy/logster https://github.com/logstash/logstash • Parse log files

    and send data to statsd
  41. input { file { path => "/var/log/apache/access.log" type => "apache-access"

    } } Sample logstash config http://logstash.net/docs/1.3.3/tutorials/metrics-from-logs https://github.com/logstash/logstash/blob/master/patterns/grok-patterns filter { grok { type => "apache-access" pattern => "%{COMBINEDAPACHELOG}" } } output { statsd { # Count one hit every event by response increment => "apache.response.%{response}" } }
  42. http://logstash.net/docs/1.3.3/tutorials/metrics-from-logs http://cookbook.logstash.net/recipes/statsd-metrics/ Graphite graph of http response codes

  43. http://codeascraft.com/2010/12/08/track-every-release/ Metric on PHP Warnings

  44. Metric on PHP Warnings correlated to deploys

  45. Metrics from Logfiles • Logfiles contain all types of data

    so their are many possibilities.
  46. Graphite http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.a uthentication.password.succeeded.count&title=Succesful%20logins Render Api - outputs a image

    Useful for building custom dashboards <img src=“ ” /> http://graphite.readthedocs.org/en/latest/render_api.html
  47. Graphite Powerful function library http://graphite.readthedocs.org/en/latest/functions.html • Timeshift (compare today’s quantity

    of logins vs those from last weeks) • asPercent (compare one metric as a percent of another failed/attempted logins)
  48. StatsD Pluggable back-end amqp-backend ganglia-backend librato-backend socket.io-backend statsd-backend mongo-backend mysql-backend

    datadog-backend opentsdb backend influxdb backend monitis backend instrumental backend hosted graphite backend statsd aggregation backend zabbix-backend https://github.com/etsy/statsd/wiki/Backends
  49. Wrapping up • Measure all the things. • Uses StatsD

    to collect data and graph it with graphite. • Better understanding of your applications. • Improve your applications.
  50. Feedback? https://joind.in/10696 @jeremyquinton