Statsd and Graphite

A1de24f8a9fe6742162f298032b56922?s=47 Jeremy Quinton
October 02, 2015
240

Statsd and Graphite

Application metrics are extremely important but are often hard to gather as our PHP Applications differ significantly. Using StatsD and Graphite we can gather metrics from our applications no matter what their shape or form. In this talk I will discuss how you can use Statsd to send various metrics of your PHP applications to Graphite. StatsD is a simple NodeJS daemon for easy stats aggregation and makes it simple to plot application metrics on a graph in Graphite. Using the metrics that are gathered its possible to get an overview of what is happening with our applications in near realtime which is extremely useful. Graphite additionally allows us to produce easy understandable graphs and dashboards which once analysed can be used to improve our PHP applications. My talk will cover everything from setting up StatsD and Graphite to how you gather the metrics from within your PHP applications. After the talk developers should be confident enough to go away and implement these technologies in their applications.

A1de24f8a9fe6742162f298032b56922?s=128

Jeremy Quinton

October 02, 2015
Tweet

Transcript

  1. Gathering Metrics with StatsD and Graphite http://www.flickr.com/photos/wwarby/ 3296379139 Jeremy Quinton

    - PHP South Africa October 2015 @jeremyquinton + Grafana
  2. twitter - @jeremyquinton Developer - PHP since 2003 Open Source

    Enthusiast Devops Evangelist Infrastructure and Architecture
  3. Devops Culture Automation Measurement Sharing

  4. Comic Relief 2010 - 2013

  5. Go Live to production Majority of our traffic in a

    3 month period with big spike at the end Load Testing Building the app series of sprints - Lead time traffic
  6. None
  7. Multiple deployments with frequent changes Go Live to production Majority

    of our traffic in a 3 month period with big spike at the end Load Testing Building the app series of sprints - Lead time traffic
  8. When turning it off fixes the problem

  9. Multiple deployments with frequent changes Go Live to production Majority

    of our traffic in a 3 month period with big spike at the end Load Testing Building the app series of sprints - Lead time traffic No Application metrics
  10. You can’t optimise what you can’t measure - Juozas Kaziukenas

  11. Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/ Flickr - http://code.flickr.net/2008/10/27/counting-timing/ Blogs posts

  12. Flickr “The more stuff we can measure, the better our

    understanding of how different parts of the website work with each other gets” Flickr - http://code.flickr.net/2008/10/27/counting-timing/
  13. Etsy “If Engineering at Etsy has a religion, it’s the

    Church of Graphs. If it moves, we track it. Sometimes we’ll draw a graph of something that isn’t moving yet, just in case it decides to make a run for it.” Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
  14. “Application metrics are usually the hardest, yet most important, of

    the three.” In general, we tend to measure at three levels: 
 - network, machine, and application Etsy Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/ “They’re very specific to your business, and they change as your applications change”
  15. “Measurement is the first step that leads to control and

    eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” ― H. James Harrington
  16. Metric Noun “a system or standard of measurement.”

  17. “We decided to make it ridiculously simple for any engineer

    to get anything they can count or time into a graph with almost no effort” Etsy
  18. System Overview to gather application metrics Application StatsD Graphite Carbon

    Whisper Wep app Grafana
  19. Types of Application Metrics • Counting How many emails did

    we send in the last 10 minutes? How many users logging into the system in the last 5 minutes? How many calls have we made to a particular api?
  20. https://packagist.org/search/?q=statsd PHP statsD client library

  21. Over the wire accounts.authentication.login.attempted:1|c <?php $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost', 8125);


    Example Counting Metric Metric name Count value Metric type echo -n “accounts.authentication.login.attempted:1|c” | nc -4u -w1 localhost 8125 // simple count - Answering the question how many logins have been attempted
 $statsd->increment(“accounts.authentication.login.attempted”); $statsd = new Domnikl\Statsd\Client($connection);
  22. Graphite - Dasboard

  23. https://www.youtube.com/watch?v=sKNZMtoSHN4&index=7&list=PLDGkOdUX1Ujo3wHw9-z5Vo12YLqXRjzg2 Grafana - Graph

  24. Application StatsD https://github.com/etsy/statsd UDP Connection

  25. StatsD git clone https://github.com/etsy/statsd Modify exampleConfig.js - Quickstart guide node

    stats.js /path/to/config
  26. StatsD Sample config { "backends": [ "./backends/graphite" ], "graphite": {

    legacyNamespace: false }, "graphiteHost": "127.0.0.1", "graphitePort": 2003, "port": 8125, "flushInterval": 10000, "debug" : true, "dumpMessages" : “true” }
  27. StatsD flush interval default every 10s Half a System Application

    <?php 
 … $statsd->increment(“accounts.authentication.login.attempted”); accounts.authentication.login.attempted:1|c 23 2.3 stats.counters.accounts.authentication.login.attempted.rate stats.counters.accounts.authentication.login.attempted.count Added by StatsD Legacy namespace = false
  28. Namespacing Metrics https://github.com/etsy/statsd/blob/master/docs/namespacing.md http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/ <namespace>.<instrumented section>.<target (noun)>.<action (past tense verb)>

    accounts.authentication.login.attempted
 accounts.authentication.login.succeeded
 accounts.authentication.login.failed <product>.<environment>.<subcomponet>.<metric> emailvision.production.api.sendEmailResponseTime
  29. Application StatsD UDP Connection Graphite Carbon Whisper Wep app metrics

    being flushed to back-end Grafana TCP Connection
  30. Graphite Wep app accounts.authentication.login.attempted

  31. Graphite Wep app

  32. Carbon Configured with /opt/graphite/conf/carbon.conf Graphite

  33. Whisper • Fixed-size database, similar in design to Round Robin

    Database. archive n archive 2 metadata archive 1 1392589140 25 1392589150 17 1392589160 34 1392589170 68 Example time series data https://github.com/graphite-project/graphite-web/blob/master/docs/whisper.rst
  34. Graphite Whisper Configured with /opt/graphite/conf/storage-schemas.conf Configured with /opt/graphite/conf/storage-aggregation.conf

  35. storage-schemas.conf Whisper http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf [stats] pattern = ^stats.* stats.counters.accounts.authentication.login.attempted.count metadata archive

    1 10s:6h Frequency History retentions = 10s:6h 1m:7d archive 2 ,1m:7d, archive n 10m:5y 10m:5y • Lowest retention must be the same as the default flush interval for StatsD.
 • Whenever metrics are old enough to leave an archive they get aggregated.
  36. Whisper storage-aggregation.conf http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf [count] pattern = \.count$ aggregationMethod = sum

    • Aggregation methods include average,sum,min,max https://github.com/etsy/statsd/blob/master/docs/graphite.md • Default config to get statsD working with Graphite
  37. Application StatsD flush interval default every 10s Graphite Carbon Whisper

    Wep app carbon.conf storage-schema.conf Full System <?php 
 … $statsd->increment(“accounts.authentication.login.attempted”); storage-aggregation.conf accounts.authentication.login.attempted:1|c Grafana
  38. <?php public function authenticate(…) { …….. $statsd->increment(“accounts.authentication.login.attempted”); if(password_verify($password, $hash) {

    …….. $statsd->increment(“accounts.authentication.login.success”); } else { ……. $statsd->increment(“accounts.authentication.login.failed”) } } Counting Metrics continued
  39. Graphite stats.counters.accounts.authentication.password.attempted.count stats.counters.accounts.authentication.password.failed.count stats.counters.accounts.authentication.password.succeeded.count Data buckets

  40. Graphite

  41. Grafana - Graph

  42. Track every software release $statsd->increment(“deploys”); echo “deploys:1|c” | nc -w0

    -u 127.0.0.1
  43. Grafana - Annotation Graphite drawAsInfinite(stats.counters.deploys.count)

  44. Grafana

  45. Other Metric Types • Timing • Gauges • Sets https://github.com/etsy/statsd/blob/master/docs/metric_types.md

  46. Timing $statsd->startTiming(“emailvision.production.api.sendEmailResponseTime"); // code which connects and sends email $statsd->endTiming("emailvision.production.api.sendEmailResponseTime");

    $statsd->timing("emailvision.production.api.sendEmailResponseTime", 320); emailvision.production.api.sendEmailResponseTime:320|ms
  47. Timing metrics under the hood Mean - 209 = sum(values)/number

    of values Lower - 200 Upper - 217 • StatsD does a lot of aggregation for us. Set of data for a 10s period - 200,210,208,207,212,217 • One timing metric produces 9 data buckets. stats.timers.emailvision.production.api.sendEmailResponseTime.lower stats.timers.emailvision.production.api.sendEmailResponseTime.mean stats.timers.emailvision.production.api.sendEmailResponseTime.upper stats.timers.emailvision.production.api.sendEmailResponseTime.mean_90 stats.timers.emailvision.production.api.sendEmailResponseTime.median stats.timers.emailvision.production.api.sendEmailResponseTime.std stats.timers.emailvision.production.api.sendEmailResponseTime.sum stats.timers.emailvision.production.api.sendEmailResponseTime.sum_90 stats.timers.emailvision.production.api.sendEmailResponseTime.upper_90
  48. Example timing metrics in Grafana

  49. Example timing metrics in Grafana What happened?

  50. http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.authentication.password.succeeded.count&title=Succ esful%20logins&format=json Graphite - Render api http://graphite.readthedocs.org/en/latest/render_api.html

  51. Anomaly Detection http://codeascraft.com/2013/06/11/introducing-kale/ • Skyline developed by Etsy - https://github.com/etsy/skyline

    • Simple Nagios checks - https://github.com/pyr/check-graphite (Discontinued)
  52. Metric From your log files https://github.com/etsy/logster https://github.com/logstash/logstash

  53. input { file { path => "/var/log/apache/access.log" type => "apache-access"

    } } Sample logstash config http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns filter { grok { type => "apache-access" pattern => "%{COMBINEDAPACHELOG}" } } output { statsd { # Count one hit every event by response increment => "apache.response.%{response}" } } COMBINEDAPACHELOG ……. %{NUMBER:response} …… Apache log line - 127.0.0.1 - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
  54. http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs http://cookbook.logstash.net/recipes/statsd-metrics/ Graphite graph of http response codes

  55. http://codeascraft.com/2010/12/08/track-every-release/ Metric on PHP Warnings

  56. Metric on PHP Warnings correlated to deploys http://codeascraft.com/2010/12/08/track-every-release/

  57. Metrics from Logfiles • Logfiles contain all types of data

    so their are many possibilities.
  58. Graphite http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.a uthentication.password.succeeded.count&title=Succesful%20logins Render Api - outputs a image

    Useful for building custom dashboards <img src=“ ” /> http://graphite.readthedocs.org/en/latest/render_api.html
  59. Graphite Powerful function library http://graphite.readthedocs.org/en/latest/functions.html • Timeshift (compare today’s quantity

    of logins vs those from last weeks) • asPercent (compare one metric as a percent of another failed/attempted logins)
  60. StatsD Pluggable back-end amqp-backend ganglia-backend librato-backend socket.io-backend statsd-backend mongo-backend mysql-backend

    datadog-backend opentsdb backend influxdb backend monitis backend instrumental backend hosted graphite backend statsd aggregation backend zabbix-backend https://github.com/etsy/statsd/wiki/Backends
  61. Wrapping up • Measure all the things. • Uses StatsD

    to collect metrics and graph it with graphite. • Better understanding of your applications. • Improve your applications.