Slide 1

Slide 1 text

MONITORING! GRAPHITE and friends

Slide 2

Slide 2 text

Structure ☑ Context ☐ Monitoring ☐ Graphite ☐ Front Ends ☐ Feeding ☐ Graphite Tips ☐ Managing / Scaling

Slide 3

Slide 3 text

context

Slide 4

Slide 4 text

Mark Crossfield, @mrmanc http://markcrossfield.co.uk Auto Trader Engineer for 7 years Continuous Delivery & Web

Slide 5

Slide 5 text

Hiring! http://careers.autotrader.co.uk/

Slide 6

Slide 6 text

AWESOME NEW OFFICES

Slide 7

Slide 7 text

page impressions / day 70 million

Slide 8

Slide 8 text

unique monthly users 14.5 million

Slide 9

Slide 9 text

searches / second at peak 1,500

Slide 10

Slide 10 text

adverts 435k

Slide 11

Slide 11 text

product & technology staff 323

Slide 12

Slide 12 text

servers 2000

Slide 13

Slide 13 text

code bases 250

Slide 14

Slide 14 text

monitoring

Slide 15

Slide 15 text

Velocity 2011 https://www.flickr.com/photos/kellan/5839797269/

Slide 16

Slide 16 text

January 2012

Slide 17

Slide 17 text

http://blog.tagman.com/2012/03/just-one-second-delay-in-page-load-can-cause-7-loss-in-customer-conversions/ One Second Delay In Page-Load Can Cause 
 7% Loss In Customer Conversions “ ” 47% of consumers expect a page to load in 2 seconds or less “

Slide 18

Slide 18 text

types of
 metric… http://code.flickr.net/2008/10/27/counting-timing/ 3

Slide 19

Slide 19 text

counters easy as pie to aggregate—just sum

Slide 20

Slide 20 text

timers harder to aggregate—need percentiles
 this can make scaling difficult

Slide 21

Slide 21 text

gauges more nuanced—keep last value and use means

Slide 22

Slide 22 text

graphite

Slide 23

Slide 23 text

time series database graphite lets you visualise metrics from it’s

Slide 24

Slide 24 text

turns this local.random.diceroll 4 1415727632

Slide 25

Slide 25 text

into this

Slide 26

Slide 26 text

Die, composer. Die.

Slide 27

Slide 27 text

grafana is a nicer front end… more later

Slide 28

Slide 28 text

functions let you transform and summarise the values over time

Slide 29

Slide 29 text

aggregation precision reduces over time One year : daily One month: hourly One week: 5min One day: min

Slide 30

Slide 30 text

fixed size roll up means metrics do not grow over time

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

THESE ARE NOT THE DOCS YOU ARE LOOKING FOR

Slide 33

Slide 33 text

Docs Moved

Slide 34

Slide 34 text

Old, out dated docs New docs

Slide 35

Slide 35 text

graphite for summarised long term trends

Slide 36

Slide 36 text

elasticsearch
 logstash
 kibana for fine grained events

Slide 37

Slide 37 text

graphite architecture

Slide 38

Slide 38 text

Bucky, Collectl, Diamond, Ganglia, Graphite PowerShell Functions, HoardD, Host sFlow, jmxtrans, Logster, metrics-sampler, Sensu, SqlToGraphite, SSC Serv, Backstop, Evenflow, Graphite-Newrelic, Graphite-relay, Graphios, Grockets, Ledbetter, pipe-to-graphite, statsd, Charcoal, Descartes, Dusk, Firefly, Gdash, Giraffe, Grafana, graphitus, Graph-Explorer, Graph-Index, Graphene, Graphite- Observer, Graphite-Tattle, Graphiti, Graphitoid, Graphsky, Hubot, Leonardo, Orion, Pencil, Seyren, Tasseo, Tessera, TimeseriesWidget, Cabot, graphite-beacon, rearview, Rocksteady, Shinken, Therry Plenty of adoption

Slide 39

Slide 39 text

front ends

Slide 40

Slide 40 text

image api for individual charts

Slide 41

Slide 41 text

grafana is awesome & free as in beer and speech http://play.grafana.org/#/dashboard/db/grafana-play-home

Slide 42

Slide 42 text

http://grafana.org/

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

interactive charts let you explore values / filter series

Slide 45

Slide 45 text

time window synchronises

Slide 46

Slide 46 text

replaces composer with expression parser and builder

Slide 47

Slide 47 text

explore metrics

Slide 48

Slide 48 text

dynamic filtering using filters from queries

Slide 49

Slide 49 text

annotations show events from graphite or a query

Slide 50

Slide 50 text

no back end (elasticsearch optional)

Slide 51

Slide 51 text

dashboard sharing and playlists

Slide 52

Slide 52 text

dark and light themes

Slide 53

Slide 53 text

no backend awesome concept, better with elastic search

Slide 54

Slide 54 text

presentation grafana presentation at monitorama portland 2014 http://grafana.org/blog/2014/05/25/monitorama-video-and-update.html

Slide 55

Slide 55 text

http://shopify.github.io/dashing/

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

feeding need input

Slide 58

Slide 58 text

mark@spaceport  ~ $ echo "random.diceroll 4 `date +%s`" | nc -q0 graphite.org 2003

Slide 59

Slide 59 text

decouple don’t make everything talk graphite

Slide 60

Slide 60 text

label your axes, before it is too late talk about units, frequencies, and include in the metric name

Slide 61

Slide 61 text

log lines parsed with logster / second 11,000

Slide 62

Slide 62 text

statsd aggregates metrics in real time and sends to graphite

Slide 63

Slide 63 text

collectd can collect and send server metrics to graphite

Slide 64

Slide 64 text

10s Frequency! be careful that collectd doesn’t flood graphite

Slide 65

Slide 65 text

logstash graphite supported as output (and input)

Slide 66

Slide 66 text

metrics by coda hale.

Slide 67

Slide 67 text

http://metrics.codahale.com/ codahale metrics

Slide 68

Slide 68 text

instruments your java components.

Slide 69

Slide 69 text

aggregates timers, histograms etc

Slide 70

Slide 70 text

no statsd unless you need cross host aggregation

Slide 71

Slide 71 text

tips mostly plagiarised

Slide 72

Slide 72 text

writing twice overwrites carbon does no aggregation for you

Slide 73

Slide 73 text

feeding interval == graphite bucket this is no coincidence

Slide 74

Slide 74 text

xFilesFactor sparse metrics might not appear http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-9

Slide 75

Slide 75 text

carbon limits writing new metrics avoids swamping disk with write IO http://bit.ly/maxcreates

Slide 76

Slide 76 text

graphite bookmarklet useful to load charts from images http://obfuscurity.com/2012/04/Unhelpful-Graphite-Tip-2

Slide 77

Slide 77 text

timeShift(series, duration) e.g. show yesterday’s metric against today’s timeShift(“apache.http.requests”, “-1day”) http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.timeShift

Slide 78

Slide 78 text

groupByNode(series, node, aggregate) e.g. aggregate many series using one node groupByNode(“collectd.*.cpu.*.value”, “2”, “maxSeries”) http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.groupByNode

Slide 79

Slide 79 text

cumulative(seriesList) show how a rate (e.g. per sec) adds up over the day http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.cumulative

Slide 80

Slide 80 text

host.cpu-[0-7].cpu-{user,system}.value wild cards allow filtering of nodes http://graphite.readthedocs.org/en/0.9.12/terminology.html#term-series-list

Slide 81

Slide 81 text

teatime is 4pm Graphite understands the UNIX At time specification e.g. noon yesterday, now-2weeks http://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html#IAT_STYLE_TIME_SPECIFICATION

Slide 82

Slide 82 text

monitor carbon carbon records it’s own metrics per minute to graphite http://obfuscurity.com/2012/06/Watching-the-Carbon-Feed

Slide 83

Slide 83 text

whisper-*.py create, dump, fetch, info, merge, resize, set-aggregation method, update, diff https://github.com/graphite-project/whisper

Slide 84

Slide 84 text

summarize be suspicious cautious around aggregation boundaries http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.summarize

Slide 85

Slide 85 text

holt winters intelligent aberration detection http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.holtWintersAberration

Slide 86

Slide 86 text

events(*tags) number of events matching tags at this point in time use to annotate your charts e.g. events(“deploy”, “change”) http://graphite.readthedocs.org/en/0.9.12/functions.html#graphite.render.functions.events

Slide 87

Slide 87 text

No content

Slide 88

Slide 88 text

No content

Slide 89

Slide 89 text

managing / scaling

Slide 90

Slide 90 text

metrics relayed / minute 300,000

Slide 91

Slide 91 text

application metrics 30,000

Slide 92

Slide 92 text

write iops / second / cache (~8% of total activity) 8,000

Slide 93

Slide 93 text

sqlite do not run in production

Slide 94

Slide 94 text

scale with one box while you can…

Slide 95

Slide 95 text

Graphite Architecture

Slide 96

Slide 96 text

relay lets you shard (consistent hash) or replicate

Slide 97

Slide 97 text

No content

Slide 98

Slide 98 text

carbon docs leave a lot to be desired

Slide 99

Slide 99 text

blogs only slightly contradictory

Slide 100

Slide 100 text

http://bitprophet.org/blog/2013/03/07/graphite/

Slide 101

Slide 101 text

https://gist.github.com/obfuscurity/63399584ea4d95f921e4

Slide 102

Slide 102 text

https://answers.launchpad.net/graphite/+question/178969

Slide 103

Slide 103 text

http://grey-boundary.com/the-architecture-of-clustering-graphite/

Slide 104

Slide 104 text

carbonate provides missing bits of carbon to assist scaling https://github.com/jssjr/carbonate

Slide 105

Slide 105 text

graphite-api without the front end https://github.com/brutasse/graphite-api functional goodness

Slide 106

Slide 106 text

@obfuscurity great blogger and an authority on graphite http://obfuscurity.com/

Slide 107

Slide 107 text

No content

Slide 108

Slide 108 text

Summary •Good tool for long term trending of time series data •Use Grafana and perhaps Dashing as a front end •Feed your data in through statsd, collectd & metrics •Lever the functions •You may experience scaling pain

Slide 109

Slide 109 text

QUESTIONS?