Slide 1

Slide 1 text

StatsD: Measure Anything / Measure Everything Dan Rowe April 30, 2013 Wednesday, May 1, 13

Slide 2

Slide 2 text

Who am I? • Dan Rowe • DevTools Engineer at Etsy • drowe@etsy.com • @draco2002 • https://github.com/draco2003 Wednesday, May 1, 13

Slide 3

Slide 3 text

Who is Etsy? • 1.57 billion page views • $101.7 million worth of goods sold • 4,534,479 items sold • 22+ million members Etsy is the marketplace we make together March 2013 Stats: http://www.etsy.com/blog/news/2013/etsy-statistics-march-2013-weather-report/ Wednesday, May 1, 13

Slide 4

Slide 4 text

What is StatsD? • A network daemon that listens for statistics, like counters and timers, and sends aggregates to one or more pluggable backend services. https://github.com/etsy/statsd Wednesday, May 1, 13

Slide 5

Slide 5 text

Why StatsD? • Real-time metrics at high concurrency, without introducing user facing latency. • Users shouldn't have to wait while you measure everything. • You shouldn't have to limit what you measure to avoid the user noticing. Wednesday, May 1, 13

Slide 6

Slide 6 text

Why UDP? • Fire and Forget • No waiting • Best Effort Wednesday, May 1, 13

Slide 7

Slide 7 text

What can I send? • Simple text based protocol messages • Accepts Counters,Timers, Gauges, and Sets • Calculates additional metrics for you as well. Wednesday, May 1, 13

Slide 8

Slide 8 text

Counters • stats.counters.foo.car.rate • stats.counters.foo.car.count foo.car:1|c Wednesday, May 1, 13

Slide 9

Slide 9 text

Gauges • stats.gauges.foo.gar foo.gar:30|g foo.gar:-5|g foo.gar:+5|g Wednesday, May 1, 13

Slide 10

Slide 10 text

Sets • stats.sets.foo.sar foo.sar:30|s foo.sar:50|s Wednesday, May 1, 13

Slide 11

Slide 11 text

Timers • stats.timers.foo.tar.mean_90 • stats.timers.foo.tar.std • stats.timers.foo.tar.sum_90 foo.tar:30|ms Wednesday, May 1, 13

Slide 12

Slide 12 text

Histograms • stats.timers.foo.tar.histogram.bin_10 • stats.timers.foo.tar.histogram.bin_20 histogram:[{metric:”foo.tar”, bins:[10,20,30,50]}] Wednesday, May 1, 13

Slide 13

Slide 13 text

How to send Metrics? • echo -n “foo.car:1|c” | nc -w0 -u 127.0.0.1 8125 • https://github.com/etsy/statsd/tree/master/ examples Wednesday, May 1, 13

Slide 14

Slide 14 text

What does StatsD do? • Aggregates metrics for a set interval, and then flushes them to the backend. • This buffers the backend from the user/application. Wednesday, May 1, 13

Slide 15

Slide 15 text

Where do metrics go? • StatsD has a pluggable backend system. • By default it comes with these backends: • Graphite • Console • Repeater Wednesday, May 1, 13

Slide 16

Slide 16 text

Pluggable Backends • Additional backend modules can be installed simply as additional npm modules. • cd statsd/ npm install statsd-socket.io Wednesday, May 1, 13

Slide 17

Slide 17 text

TCP Admin Interface • Defaults to listening on port 8126 • Commands: stats, counters, timers, gauges, delcounters, deltimers, delgauges, health, quit Wednesday, May 1, 13

Slide 18

Slide 18 text

Scaling StatsD • A single core only scales so far. Wednesday, May 1, 13

Slide 19

Slide 19 text

Scaling StatsD Metric Sampling • Reduces the volume of metrics sent to StatsD. • statsd-timer-metric-counts.sh • keyFlush setting and frequent Keys log. Wednesday, May 1, 13

Slide 20

Slide 20 text

Scaling StatsD deleteIdleStats: true • By Default StatsD "remembers" all metrics that it has recieved. • If it hasn't received a data point for the current flushInterval it'll send a zero. • This forces StatsD to processs old/stale metrics until the next restart or by manually deleting them via the tcp interface. Wednesday, May 1, 13

Slide 21

Slide 21 text

Scaling StatsD UDP Tuning • # increase default core memory sizes • net.core.rmem_default = 16777216 • net.core.wmem_default = 16777216 • net.ipv4.udp_wmem_min = 67108864 • net.ipv4.udp_rmem_min = 67108864 • net.ipv4.udp_mem = 4648512 6198016 9297024 Wednesday, May 1, 13

Slide 22

Slide 22 text

Scaling StatsD StatsD Cluster Proxy Wednesday, May 1, 13

Slide 23

Slide 23 text

How does it work? • Uses a hashring based on the metric name to assign which StatsD instance to send to. • Monitors StatsD instances for existance/health. • Recalculates hashring based on StatsD instance health. • deleteIdleStats must be turned on. Wednesday, May 1, 13

Slide 24

Slide 24 text

Configuring the Proxy #Example config { nodes: [ {host: '127.0.0.1', port: 8127, adminport: 8128}, {host: '127.0.0.1', port: 8129, adminport: 8130}, {host: '127.0.0.1', port: 8131, adminport: 8132} ], udp_version: 'udp4', host: '0.0.0.0', port: '8125', checkInterval: 1000, cacheSize: 10000 Configure two or more more StatsD Instances, and add them to the Proxy Configuration. Wednesday, May 1, 13

Slide 25

Slide 25 text

Why a seperate app? • Allows StatsD to scale beyond one machine • Keeps StatsD core as simple as possible. • If a StatsD instance or server goes down, than the hashring is recalculated and metrics still flow. Wednesday, May 1, 13

Slide 26

Slide 26 text

Demo Time • Show SocketIO backend • Show Multiple StatsD Instances • Show failover functionality Wednesday, May 1, 13

Slide 27

Slide 27 text

Hiring • Etsy is always looking for great Engineers. • Come help build and improve the tools that let Etsy Monitor, Develop and Deploy at scale. • http://etsy.com/careers Wednesday, May 1, 13

Slide 28

Slide 28 text

Questions? • Feel free to ask now, or later via • drowe@etsy.com • @draco2002 • https://github.com/draco2003 Wednesday, May 1, 13