Slide 1

Slide 1 text

Gathering Metrics with StatsD and Graphite http://www.flickr.com/photos/wwarby/ Jeremy Quinton - PHP Cape Town 2015 @jeremyquinton

Slide 2

Slide 2 text

twitter - @jeremyquinton Developer - PHP since 2003 Open Source Enthusiast Devops Evangelist Infrastructure and Architecture

Slide 3

Slide 3 text

Devops • Culture • Automation • Measurement • Sharing

Slide 4

Slide 4 text

Web Application Go Live to production Majority of our traffic in a 3 month period with big spike at the end Load Testing Multiple deployments with frequent changes Building the app series of sprints - Lead traffic No Application metrics

Slide 5

Slide 5 text

You can’t optimise what you can’t measure - Juozas Kaziukenas

Slide 6

Slide 6 text

Flickr - http://code.flickr.net/2008/10/27/counting- timing/ Etsy - http://codeascraft.com/2011/02/15/measure- anything-measure-everything/ Blogs posts

Slide 7

Slide 7 text

Measuring at three levels • Network • Machine • Application http://codeascraft.com/2011/02/15/measure-anything-measure-everything/

Slide 8

Slide 8 text

Why measurement is important? • Helps us to understand things. • Once we understand things we can improve them.

Slide 9

Slide 9 text

“Measurement is the first step that leads to control and eventually to improvement. If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.” ― H. James Harrington

Slide 10

Slide 10 text

"A system or standard of measurement" Metric

Slide 11

Slide 11 text

System Overview Application StatsD Graphite Carbon Whisper Wep app

Slide 12

Slide 12 text

Types of Application Metrics • Counting How many emails did we send over the last 10 minutes? How many users logging into the system in the last 5 minutes? How many calls have we made to a particular api?

Slide 13

Slide 13 text

https://packagist.org/search/?q=statsd PHP statsD client library

Slide 14

Slide 14 text

Over the wire accounts.authentication.login.attempted:1|c increment(“accounts.authentication.login.attempted”);

Slide 15

Slide 15 text

StatsD • Built By Etsy. • Network Daemon gathers and flushes metrics to a back-end. • Sends them to a back-end at a specific interval. Default 10s. • Uses UDP fire and forget.

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

StatsD • git clone https://github.com/etsy/statsd.git • Create a config file from exampleConfig.js and put it somewhere. • Start the Daemon: node stats.js /path/to/config

Slide 18

Slide 18 text

StatsD { port: 8125, backends: ['./backends/graphite'] graphitePort: 2003, graphiteHost: “127.0.0.1", debug: true, legacyNamespace: false } Sample config

Slide 19

Slide 19 text

StatsD flush interval default every 10s Half a System Application increment(“accounts.authentication.login.attempted”); stats.counters.accounts.authentication.login.attempted.count stats.counters.accounts.authentication.login.attempted.rate Added by StatsD accounts.authentication.login.attempted:1|c 23 2.3

Slide 20

Slide 20 text

Namespacing Metrics https://github.com/etsy/statsd/blob/master/docs/namespacing.md http://matt.aimonetti.net/posts/2013/06/26/practical-guide-to-graphite-monitoring/ ... accounts.authentication.login.attempted
 accounts.authentication.login.succeeded
 accounts.authentication.login.failed ... emailvision.production.api.sendEmailResponseTime

Slide 21

Slide 21 text

Graphite • Store numeric time-series data. • Render graphs of this data on demand. • Powerful function library.

Slide 22

Slide 22 text

Carbon Configured with /opt/graphite/conf/carbon.conf Whisper Configured with /opt/graphite/conf/storage-schemas.conf Configured with /opt/graphite/conf/storage-aggregation.conf Graphite

Slide 23

Slide 23 text

Whisper • Fixed-size database, similar in design to Round Robin Database. archive n archive 2 metadata archive 1 1392589140 25 1392589150 17 1392589160 34 1392589170 68 Example time series data https://github.com/graphite-project/graphite-web/blob/master/docs/whisper.rst

Slide 24

Slide 24 text

storage-schemas.conf Whisper http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-schemas-conf [stats] pattern = ^stats.* stats.counters.accounts.authentication.login.attempted.count metadata archive 1 10s:6h Frequency History retentions = 10s:6h 1m:7d archive 2 ,1m:7d, archive n 10m:5y 10m:5y • Lowest retention must be the same as the default flush interval for StatsD.
 • Whenever metrics are old enough to leave an archive they get aggregated.

Slide 25

Slide 25 text

Whisper storage-aggregation.conf http://graphite.readthedocs.org/en/latest/config-carbon.html#storage-aggregation-conf https://github.com/etsy/statsd/blob/master/docs/graphite.md [count] pattern = \.count$ aggregationMethod = sum xFilesFactor = 0 • By default whisper uses average so for counter metrics we want the
 aggregation method to be sum. • Aggregation methods include average,sum,min,max

Slide 26

Slide 26 text

Application StatsD flush interval default every 10s Graphite Carbon Whisper Wep app carbon.conf storage-schema.conf Full System increment(“accounts.authentication.login.attempted”); storage-aggregation.conf accounts.authentication.login.attempted:1|c

Slide 27

Slide 27 text

increment(“accounts.authentication.login.attempted”); if(password_verify($password, $hash) { …….. $statsd->increment(“accounts.authentication.login.success”); } else { ……. $statsd->increment(“accounts.authentication.login.failed”) } } Counting Metrics continued

Slide 28

Slide 28 text

Graphite stats.counters.accounts.authentication.password.attempted.count stats.counters.accounts.authentication.password.failed.count stats.counters.accounts.authentication.password.succeeded.count Data buckets

Slide 29

Slide 29 text

Graphite

Slide 30

Slide 30 text

Graphite

Slide 31

Slide 31 text

Track every release The trick to displaying events in Graphite is to apply the drawAsInfinite() function. $statsd->increment(“deploys”); echo “deploys:1|c” | nc -w0 -u 127.0.0.1

Slide 32

Slide 32 text

Other Metric Types • Timing • Gauges • Sets https://github.com/etsy/statsd/blob/master/docs/metric_types.md

Slide 33

Slide 33 text

Timing $statsd->startTiming(“emailvision.production.api.sendEmailResponseTime"); // code which connects and sends email $statsd->endTiming("emailvision.production.api.sendEmailResponseTime"); $statsd->timing("emailvision.production.api.sendEmailResponseTime", 320); emailvision.production.api.sendEmailResponseTime:320|ms

Slide 34

Slide 34 text

Timing metrics under the hood Mean - 209 = sum(values)/number of values Lower - 200 Upper - 217 • StatsD does a lot of aggregation for us. Set of data for a 10s period - 200,210,208,207,212,217 • One timing metric produces 9 data buckets. stats.timers.emailvision.production.api.sendEmailResponseTime.lower stats.timers.emailvision.production.api.sendEmailResponseTime.mean stats.timers.emailvision.production.api.sendEmailResponseTime.upper stats.timers.emailvision.production.api.sendEmailResponseTime.mean_90 stats.timers.emailvision.production.api.sendEmailResponseTime.median stats.timers.emailvision.production.api.sendEmailResponseTime.std stats.timers.emailvision.production.api.sendEmailResponseTime.sum stats.timers.emailvision.production.api.sendEmailResponseTime.sum_90 stats.timers.emailvision.production.api.sendEmailResponseTime.upper_90

Slide 35

Slide 35 text

Example timing metrics in Graphite

Slide 36

Slide 36 text

Example timing metrics in Graphite What happened?

Slide 37

Slide 37 text

http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.authentication.password.succee ded.count&title=Succesful%20logins&format=json Graphite

Slide 38

Slide 38 text

Anomaly Detection http://codeascraft.com/2013/06/11/introducing-kale/ • Skyline developed by Etsy - https://github.com/etsy/skyline • Simple Nagios checks - https://github.com/pyr/check-graphite

Slide 39

Slide 39 text

Metric From your logs https://github.com/etsy/logster https://github.com/logstash/logstash • Parse log files and send data to statsd

Slide 40

Slide 40 text

input { file { path => "/var/log/apache/access.log" type => "apache-access" } } Sample logstash config http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns filter { grok { type => "apache-access" pattern => "%{COMBINEDAPACHELOG}" } } output { statsd { # Count one hit every event by response increment => "apache.response.%{response}" } } COMBINEDAPACHELOG ……. %{NUMBER:response} …… Apache log line - 127.0.0.1 - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326

Slide 41

Slide 41 text

http://logstash.net/docs/1.4.2/tutorials/metrics-from-logs http://cookbook.logstash.net/recipes/statsd-metrics/ Graphite graph of http response codes

Slide 42

Slide 42 text

http://codeascraft.com/2010/12/08/track-every-release/ Metric on PHP Warnings

Slide 43

Slide 43 text

Metric on PHP Warnings correlated to deploys

Slide 44

Slide 44 text

Metrics from Logfiles • Logfiles contain all types of data so their are many possibilities.

Slide 45

Slide 45 text

Graphite http://graphiteurl/render? from=-2hours&until=now&width=800&height=600&target=stats.counters.accounts.a uthentication.password.succeeded.count&title=Succesful%20logins Render Api - outputs a image Useful for building custom dashboards http://graphite.readthedocs.org/en/latest/render_api.html

Slide 46

Slide 46 text

Graphite Powerful function library http://graphite.readthedocs.org/en/latest/functions.html • Timeshift (compare today’s quantity of logins vs those from last weeks) • asPercent (compare one metric as a percent of another failed/attempted logins)

Slide 47

Slide 47 text

StatsD Pluggable back-end amqp-backend ganglia-backend librato-backend socket.io-backend statsd-backend mongo-backend mysql-backend datadog-backend opentsdb backend influxdb backend monitis backend instrumental backend hosted graphite backend statsd aggregation backend zabbix-backend https://github.com/etsy/statsd/wiki/Backends

Slide 48

Slide 48 text

Wrapping up • Measure all the things. • Uses StatsD to collect data and graph it with graphite. • Better understanding of your applications. • Improve your applications.