$30 off During Our Annual Pro Sale. View Details »

Statsd and Graphite

Jeremy Quinton
October 02, 2015
390

Statsd and Graphite

Application metrics are extremely important but are often hard to gather as our PHP Applications differ significantly. Using StatsD and Graphite we can gather metrics from our applications no matter what their shape or form. In this talk I will discuss how you can use Statsd to send various metrics of your PHP applications to Graphite. StatsD is a simple NodeJS daemon for easy stats aggregation and makes it simple to plot application metrics on a graph in Graphite. Using the metrics that are gathered its possible to get an overview of what is happening with our applications in near realtime which is extremely useful. Graphite additionally allows us to produce easy understandable graphs and dashboards which once analysed can be used to improve our PHP applications. My talk will cover everything from setting up StatsD and Graphite to how you gather the metrics from within your PHP applications. After the talk developers should be confident enough to go away and implement these technologies in their applications.

Jeremy Quinton

October 02, 2015
Tweet

Transcript

  1. Gathering Metrics
    with StatsD and
    Graphite
    http://www.flickr.com/photos/wwarby/
    3296379139
    Jeremy Quinton - PHP South Africa October 2015
    @jeremyquinton
    + Grafana

    View Slide

  2. twitter - @jeremyquinton
    Developer - PHP since 2003
    Open Source Enthusiast
    Devops Evangelist
    Infrastructure and Architecture

    View Slide

  3. Devops
    Culture
    Automation
    Measurement
    Sharing

    View Slide

  4. Comic Relief 2010 - 2013

    View Slide

  5. Go Live to production
    Majority of our traffic in a
    3 month period with big
    spike at the end
    Load Testing
    Building the app series of sprints - Lead time
    traffic

    View Slide

  6. View Slide

  7. Multiple deployments with frequent changes
    Go Live to production
    Majority of our traffic in a
    3 month period with big
    spike at the end
    Load Testing
    Building the app series of sprints - Lead time
    traffic

    View Slide

  8. When turning it off fixes the problem

    View Slide

  9. Multiple deployments with frequent changes
    Go Live to production
    Majority of our traffic in a
    3 month period with big
    spike at the end
    Load Testing
    Building the app series of sprints - Lead time
    traffic
    No Application metrics

    View Slide

  10. You can’t optimise what you can’t measure - Juozas Kaziukenas

    View Slide

  11. Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
    Flickr - http://code.flickr.net/2008/10/27/counting-timing/
    Blogs posts

    View Slide

  12. Flickr
    “The more stuff we can measure, the better our understanding
    of how different parts of the website work with each other gets”
    Flickr - http://code.flickr.net/2008/10/27/counting-timing/

    View Slide

  13. Etsy
    “If Engineering at Etsy has a religion, it’s the Church of Graphs.
    If it moves, we track it. Sometimes we’ll draw a graph of
    something that isn’t moving yet, just in case it decides to make
    a run for it.”
    Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

    View Slide

  14. “Application metrics are usually the hardest, yet most important, of
    the three.”
    In general, we tend to measure at three levels: 

    - network, machine, and application
    Etsy
    Etsy - http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
    “They’re very specific to your business, and they change
    as your applications change”

    View Slide

  15. “Measurement is the first step that leads to control
    and eventually to improvement. If you can’t measure
    something, you can’t understand it. If you can’t
    understand it, you can’t control it. If you can’t control
    it, you can’t improve it.”
    ― H. James Harrington

    View Slide

  16. Metric
    Noun
    “a system or standard of measurement.”

    View Slide

  17. “We decided to make it ridiculously simple for any engineer to get
    anything they can count or time into a graph with almost no effort”
    Etsy

    View Slide

  18. System Overview to gather application metrics
    Application
    StatsD
    Graphite
    Carbon
    Whisper
    Wep app
    Grafana

    View Slide

  19. Types of Application Metrics
    • Counting
    How many emails did we send in the last 10 minutes?
    How many users logging into the system in the last 5 minutes?
    How many calls have we made to a particular api?

    View Slide

  20. https://packagist.org/search/?q=statsd
    PHP statsD client library

    View Slide

  21. Over the wire
    accounts.authentication.login.attempted:1|c
    $connection = new Domnikl\Statsd\Connection\UdpSocket('localhost', 8125);

    Example Counting Metric
    Metric name
    Count value
    Metric type
    echo -n “accounts.authentication.login.attempted:1|c” | nc -4u -w1 localhost 8125
    // simple count - Answering the question how many logins have been attempted

    $statsd->increment(“accounts.authentication.login.attempted”);
    $statsd = new Domnikl\Statsd\Client($connection);

    View Slide

  22. Graphite - Dasboard

    View Slide

  23. https://www.youtube.com/watch?v=sKNZMtoSHN4&index=7&list=PLDGkOdUX1Ujo3wHw9-z5Vo12YLqXRjzg2
    Grafana - Graph

    View Slide

  24. Application
    StatsD
    https://github.com/etsy/statsd
    UDP Connection

    View Slide

  25. StatsD
    git clone https://github.com/etsy/statsd
    Modify exampleConfig.js
    - Quickstart guide
    node stats.js /path/to/config

    View Slide

  26. StatsD
    Sample config
    {
    "backends": [ "./backends/graphite" ],
    "graphite": { legacyNamespace: false },
    "graphiteHost": "127.0.0.1",
    "graphitePort": 2003,
    "port": 8125,
    "flushInterval": 10000,
    "debug" : true,
    "dumpMessages" : “true”
    }

    View Slide

  27. StatsD
    flush interval default every 10s
    Half a System
    Application

    $statsd->increment(“accounts.authentication.login.attempted”);
    accounts.authentication.login.attempted:1|c
    23
    2.3
    stats.counters.accounts.authentication.login.attempted.rate
    stats.counters.accounts.authentication.login.attempted.count
    Added by StatsD
    Legacy namespace = false

    View Slide

  28. Namespacing Metrics
    https://github.com/etsy/statsd/blob/master/docs/namespacing.md
    http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/
    ...
    accounts.authentication.login.attempted

    accounts.authentication.login.succeeded

    accounts.authentication.login.failed
    ...
    emailvision.production.api.sendEmailResponseTime

    View Slide

  29. Application
    StatsD
    UDP Connection
    Graphite
    Carbon
    Whisper
    Wep app
    metrics being flushed to back-end
    Grafana
    TCP Connection

    View Slide

  30. Graphite Wep app
    accounts.authentication.login.attempted

    View Slide

  31. Graphite Wep app

    View Slide

  32. Carbon Configured with /opt/graphite/conf/carbon.conf
    Graphite

    View Slide

  33. Whisper
    • Fixed-size database, similar in design to Round Robin Database.
    archive n
    archive 2
    metadata archive 1
    1392589140 25
    1392589150 17
    1392589160 34
    1392589170 68
    Example time series data
    https://github.com/graphite-project/graphite-web/blob/master/docs/whisper.rst

    View Slide

  34. Graphite
    Whisper
    Configured with /opt/graphite/conf/storage-schemas.conf
    Configured with /opt/graphite/conf/storage-aggregation.conf

    View Slide

  35. storage-schemas.conf
    Whisper
    http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf
    [stats]
    pattern = ^stats.*
    stats.counters.accounts.authentication.login.attempted.count
    metadata archive 1
    10s:6h
    Frequency History
    retentions = 10s:6h
    1m:7d
    archive 2
    ,1m:7d,
    archive n
    10m:5y
    10m:5y
    • Lowest retention must be the same as the default flush interval for StatsD.

    • Whenever metrics are old enough to leave an archive they get aggregated.

    View Slide

  36. Whisper
    storage-aggregation.conf
    http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf
    [count]
    pattern = \.count$
    aggregationMethod = sum
    • Aggregation methods include average,sum,min,max
    https://github.com/etsy/statsd/blob/master/docs/graphite.md
    • Default config to get statsD working with Graphite

    View Slide

  37. Application
    StatsD
    flush interval default every 10s
    Graphite
    Carbon
    Whisper
    Wep app
    carbon.conf
    storage-schema.conf
    Full System

    $statsd->increment(“accounts.authentication.login.attempted”);
    storage-aggregation.conf
    accounts.authentication.login.attempted:1|c
    Grafana

    View Slide

  38. public function authenticate(…)
    {
    ……..
    $statsd->increment(“accounts.authentication.login.attempted”);
    if(password_verify($password, $hash) {
    ……..
    $statsd->increment(“accounts.authentication.login.success”);
    } else {
    …….
    $statsd->increment(“accounts.authentication.login.failed”)
    }
    }
    Counting Metrics continued

    View Slide

  39. Graphite
    stats.counters.accounts.authentication.password.attempted.count
    stats.counters.accounts.authentication.password.failed.count
    stats.counters.accounts.authentication.password.succeeded.count
    Data buckets

    View Slide

  40. Graphite

    View Slide

  41. Grafana - Graph

    View Slide

  42. Track every software release
    $statsd->increment(“deploys”);
    echo “deploys:1|c” | nc -w0 -u 127.0.0.1

    View Slide

  43. Grafana - Annotation
    Graphite
    drawAsInfinite(stats.counters.deploys.count)

    View Slide

  44. Grafana

    View Slide

  45. Other Metric Types
    • Timing
    • Gauges
    • Sets
    https://github.com/etsy/statsd/blob/master/docs/metric_types.md

    View Slide

  46. Timing
    $statsd->startTiming(“emailvision.production.api.sendEmailResponseTime");
    // code which connects and sends email
    $statsd->endTiming("emailvision.production.api.sendEmailResponseTime");
    $statsd->timing("emailvision.production.api.sendEmailResponseTime", 320);
    emailvision.production.api.sendEmailResponseTime:320|ms

    View Slide

  47. Timing metrics under the hood
    Mean - 209 = sum(values)/number of values
    Lower - 200
    Upper - 217
    • StatsD does a lot of aggregation for us.
    Set of data for a 10s period - 200,210,208,207,212,217
    • One timing metric produces 9 data buckets.
    stats.timers.emailvision.production.api.sendEmailResponseTime.lower
    stats.timers.emailvision.production.api.sendEmailResponseTime.mean
    stats.timers.emailvision.production.api.sendEmailResponseTime.upper
    stats.timers.emailvision.production.api.sendEmailResponseTime.mean_90
    stats.timers.emailvision.production.api.sendEmailResponseTime.median
    stats.timers.emailvision.production.api.sendEmailResponseTime.std
    stats.timers.emailvision.production.api.sendEmailResponseTime.sum
    stats.timers.emailvision.production.api.sendEmailResponseTime.sum_90
    stats.timers.emailvision.production.api.sendEmailResponseTime.upper_90

    View Slide

  48. Example timing metrics in Grafana

    View Slide

  49. Example timing metrics in Grafana
    What happened?

    View Slide

  50. http://graphiteurl/render?
    from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.authentication.password.succeeded.count&title=Succ
    esful%20logins&format=json
    Graphite - Render api
    http://graphite.readthedocs.org/en/latest/render_api.html

    View Slide

  51. Anomaly Detection
    http://codeascraft.com/2013/06/11/introducing-kale/
    • Skyline developed by Etsy - https://github.com/etsy/skyline
    • Simple Nagios checks - https://github.com/pyr/check-graphite
    (Discontinued)

    View Slide

  52. Metric From your log files
    https://github.com/etsy/logster
    https://github.com/logstash/logstash

    View Slide

  53. input {
    file {
    path => "/var/log/apache/access.log"
    type => "apache-access"
    }
    }
    Sample logstash config
    http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs
    https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns
    filter {
    grok {
    type => "apache-access"
    pattern => "%{COMBINEDAPACHELOG}"
    }
    }
    output {
    statsd {
    # Count one hit every event by response
    increment => "apache.response.%{response}"
    }
    }
    COMBINEDAPACHELOG ……. %{NUMBER:response} ……
    Apache log line - 127.0.0.1 - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

    View Slide

  54. http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs
    http://cookbook.logstash.net/recipes/statsd-metrics/
    Graphite graph of http response codes

    View Slide

  55. http://codeascraft.com/2010/12/08/track-every-release/
    Metric on PHP Warnings

    View Slide

  56. Metric on PHP Warnings correlated to deploys
    http://codeascraft.com/2010/12/08/track-every-release/

    View Slide

  57. Metrics from Logfiles
    • Logfiles contain all types of data so their are many possibilities.

    View Slide

  58. Graphite
    http://graphiteurl/render?
    from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.a
    uthentication.password.succeeded.count&title=Succesful%20logins
    Render Api - outputs a image
    Useful for building custom dashboards

    http://graphite.readthedocs.org/en/latest/render_api.html

    View Slide

  59. Graphite
    Powerful function library
    http://graphite.readthedocs.org/en/latest/functions.html
    • Timeshift (compare today’s quantity of logins vs those from last weeks)
    • asPercent (compare one metric as a percent of another failed/attempted logins)

    View Slide

  60. StatsD
    Pluggable back-end
    amqp-backend
    ganglia-backend
    librato-backend
    socket.io-backend
    statsd-backend
    mongo-backend
    mysql-backend
    datadog-backend
    opentsdb backend
    influxdb backend
    monitis backend
    instrumental backend
    hosted graphite backend
    statsd aggregation backend
    zabbix-backend
    https://github.com/etsy/statsd/wiki/Backends

    View Slide

  61. Wrapping up
    • Measure all the things.
    • Uses StatsD to collect metrics and graph it with graphite.
    • Better understanding of your applications.
    • Improve your applications.

    View Slide