Slide 1

Slide 1 text

Trending with Purpose Jason Dixon

Slide 2

Slide 2 text

Trending

Slide 3

Slide 3 text

A general direction in which something is developing or changing.

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Why do we trend?

Slide 6

Slide 6 text

Planning for growth.

Slide 7

Slide 7 text

Predicting or diagnosing failure.

Slide 8

Slide 8 text

... instead of finding out from your customer.

Slide 9

Slide 9 text

Operational Questions

Slide 10

Slide 10 text

Just because a host or service reponds, how do you know it’s working?

Slide 11

Slide 11 text

If you haven’t measured good, how will you recognize bad?

Slide 12

Slide 12 text

You don’t know what might break, so collect everything now.

Slide 13

Slide 13 text

"Those who cannot remember the past are condemned to repeat it" George Santayana

Slide 14

Slide 14 text

"I don't care if my servers are on fire as long as they're making me money" Business Owner

Slide 15

Slide 15 text

How can we add Business Value?

Slide 16

Slide 16 text

Start simple.

Slide 17

Slide 17 text

Dance with the one who brought you.

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Nagios

Slide 20

Slide 20 text

Nagios • Fault Detection

Slide 21

Slide 21 text

Nagios • Fault Detection • Notifications

Slide 22

Slide 22 text

Nagios • Fault Detection • Notifications • Escalations

Slide 23

Slide 23 text

Nagios • Fault Detection • Notifications • Escalations • Acknowledgements/Downtime

Slide 24

Slide 24 text

Nagios • Fault Detection • Notifications • Escalations • Acknowledgements/Downtime • http://www.nagios.org/

Slide 25

Slide 25 text

Nagios

Slide 26

Slide 26 text

Nagios • What it does well:

Slide 27

Slide 27 text

Nagios • What it does well: • Free

Slide 28

Slide 28 text

Nagios • What it does well: • Free • Extensible

Slide 29

Slide 29 text

Nagios • What it does well: • Free • Extensible • Plugins

Slide 30

Slide 30 text

Nagios • What it does well: • Free • Extensible • Plugins • Configuration templates

Slide 31

Slide 31 text

Nagios • What it does well: • Free • Extensible • Plugins • Configuration templates • Popular (lesser of all free evils)

Slide 32

Slide 32 text

Nagios • What it does well: • Free • Extensible • Plugins • Configuration templates • Popular (lesser of all free evils) • Log metrics (“performance data”)

Slide 33

Slide 33 text

Nagios

Slide 34

Slide 34 text

Nagios • Where it sucks:

Slide 35

Slide 35 text

Nagios • Where it sucks: • Interface

Slide 36

Slide 36 text

Nagios • Where it sucks: • Interface • (Lack of) Scalability

Slide 37

Slide 37 text

Nagios • Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits

Slide 38

Slide 38 text

Nagios • Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire

Slide 39

Slide 39 text

Nagios • Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility

Slide 40

Slide 40 text

Nagios • Where it sucks: • Interface • (Lack of) Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility • Flapping

Slide 41

Slide 41 text

Nagios

Slide 42

Slide 42 text

PNP4Nagios

Slide 43

Slide 43 text

PNP4Nagios • Basic graphing & dashboard capabilities

Slide 44

Slide 44 text

PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios performance data

Slide 45

Slide 45 text

PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios performance data • Creates graphs with RRD

Slide 46

Slide 46 text

PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios performance data • Creates graphs with RRD • Limited introspection/correlation

Slide 47

Slide 47 text

PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios performance data • Creates graphs with RRD • Limited introspection/correlation • http://www.pnp4nagios.org/

Slide 48

Slide 48 text

PNP4Nagios

Slide 49

Slide 49 text

Graphite

Slide 50

Slide 50 text

Graphite • Metric storage

Slide 51

Slide 51 text

Graphite • Metric storage • Complex graph creation

Slide 52

Slide 52 text

Graphite • Metric storage • Complex graph creation • Web and “CLI” interfaces

Slide 53

Slide 53 text

Graphite • Metric storage • Complex graph creation • Web and “CLI” interfaces • Created and released by Orbitz.com

Slide 54

Slide 54 text

Graphite • Metric storage • Complex graph creation • Web and “CLI” interfaces • Created and released by Orbitz.com • http://graphite.wikidot.com/

Slide 55

Slide 55 text

Graphite

Slide 56

Slide 56 text

Graphite

Slide 57

Slide 57 text

Graphite • The good stuff:

Slide 58

Slide 58 text

Graphite • The good stuff: • Horizontally scalable

Slide 59

Slide 59 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI)

Slide 60

Slide 60 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points

Slide 61

Slide 61 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available

Slide 62

Slide 62 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc...

Slide 63

Slide 63 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users

Slide 64

Slide 64 text

Graphite • The good stuff: • Horizontally scalable • Rapid graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users • Supports existing RRD databases

Slide 65

Slide 65 text

Graphite

Slide 66

Slide 66 text

Graphite • The not-so-good stuff:

Slide 67

Slide 67 text

Graphite • The not-so-good stuff: • Not a dashboard (well, sorta)

Slide 68

Slide 68 text

Graphite • The not-so-good stuff: • Not a dashboard (well, sorta) • No hover details

Slide 69

Slide 69 text

Graphite • The not-so-good stuff: • Not a dashboard (well, sorta) • No hover details • Single y-axis

Slide 70

Slide 70 text

Graphite Whisper Carbon Web Metrics

Slide 71

Slide 71 text

Carbon

Slide 72

Slide 72 text

Carbon • agent - starts other daemons, receives metrics and pipelines them to cache

Slide 73

Slide 73 text

Carbon • agent - starts other daemons, receives metrics and pipelines them to cache • aggregator - aggregate and transform your data before storage

Slide 74

Slide 74 text

Carbon • agent - starts other daemons, receives metrics and pipelines them to cache • aggregator - aggregate and transform your data before storage • cache - caches metrics for real-time graphing, pipelines them to persister

Slide 75

Slide 75 text

Carbon • agent - starts other daemons, receives metrics and pipelines them to cache • aggregator - aggregate and transform your data before storage • cache - caches metrics for real-time graphing, pipelines them to persister • persister - writes persistent data to disk

Slide 76

Slide 76 text

Whisper

Slide 77

Slide 77 text

Whisper • Metrics database format

Slide 78

Slide 78 text

Whisper • Metrics database format • Supplanted RRDtool

Slide 79

Slide 79 text

Whisper • Metrics database format • Supplanted RRDtool • Accepts out-of-order data

Slide 80

Slide 80 text

Whisper • Metrics database format • Supplanted RRDtool • Accepts out-of-order data • Supports pipelining of data in a single operation (multiplexing)

Slide 81

Slide 81 text

Coming Soon - Ceres

Slide 82

Slide 82 text

Coming Soon - Ceres • Smaller files - doesn’t pad missing datapoints

Slide 83

Slide 83 text

Coming Soon - Ceres • Smaller files - doesn’t pad missing datapoints • Doesn’t store timestamps, calculates them

Slide 84

Slide 84 text

Coming Soon - Ceres • Smaller files - doesn’t pad missing datapoints • Doesn’t store timestamps, calculates them • Federates individual datapoints

Slide 85

Slide 85 text

Graphite Web

Slide 86

Slide 86 text

Graphite Web • Traditional web interface

Slide 87

Slide 87 text

Graphite Web • Traditional web interface • Javascript CLI

Slide 88

Slide 88 text

Graphite Web • Traditional web interface • Javascript CLI • Rudimentary dashboard

Slide 89

Slide 89 text

Graphite Web • Traditional web interface • Javascript CLI • Rudimentary dashboard • Django application

Slide 90

Slide 90 text

Sending metrics to Graphite

Slide 91

Slide 91 text

Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

Slide 92

Slide 92 text

Sending metrics to Graphite • Connect to Carbon socket (tcp/2003) • Send your data

Slide 93

Slide 93 text

Sending metrics to Graphite • Connect to Carbon socket (tcp/2003) • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n");

Slide 94

Slide 94 text

Sending metrics to Graphite • Connect to Carbon socket (tcp/2003) • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n"); • ???

Slide 95

Slide 95 text

Sending metrics to Graphite • Connect to Carbon socket (tcp/2003) • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n"); • ??? • Profit!

Slide 96

Slide 96 text

What should we collect?

Slide 97

Slide 97 text

App/DB Profiling

Slide 98

Slide 98 text

App/DB Profiling • How fast is our:

Slide 99

Slide 99 text

App/DB Profiling • How fast is our: • function foo() for each iteration

Slide 100

Slide 100 text

App/DB Profiling • How fast is our: • function foo() for each iteration • SQL query

Slide 101

Slide 101 text

App/DB Profiling • How fast is our: • function foo() for each iteration • SQL query • 3rd party API service (e.g. payment gateway, social media)

Slide 102

Slide 102 text

App/DB Profiling

Slide 103

Slide 103 text

App/DB Profiling • How many times do we:

Slide 104

Slide 104 text

App/DB Profiling • How many times do we: • call function foo()

Slide 105

Slide 105 text

App/DB Profiling • How many times do we: • call function foo() • register a new user

Slide 106

Slide 106 text

App/DB Profiling • How many times do we: • call function foo() • register a new user • chargeback a sale

Slide 107

Slide 107 text

IT exists to support Business.

Slide 108

Slide 108 text

IT exists to support Business. DevOps

Slide 109

Slide 109 text

Not the other way around.

Slide 110

Slide 110 text

Business Profiling

Slide 111

Slide 111 text

Business Profiling • How many sales did we generate this hour:

Slide 112

Slide 112 text

Business Profiling • How many sales did we generate this hour: • per sku

Slide 113

Slide 113 text

Business Profiling • How many sales did we generate this hour: • per sku • from the recent ad campaign

Slide 114

Slide 114 text

Business Profiling • How many sales did we generate this hour: • per sku • from the recent ad campaign • from users in West Bumble, Arkansas

Slide 115

Slide 115 text

Last week?

Slide 116

Slide 116 text

Last month?

Slide 117

Slide 117 text

Last year?

Slide 118

Slide 118 text

Be Creative.

Slide 119

Slide 119 text

More Data > Less Data

Slide 120

Slide 120 text

You probably won’t know what metrics you might need until it’s too late.

Slide 121

Slide 121 text

Don’t be that guy.

Slide 122

Slide 122 text

Storage is Cheap.

Slide 123

Slide 123 text

Data Sources • Nagios • Munin • SNMP • Ganglia • collectd • SQL • Logs • sar • /proc • REST APIs

Slide 124

Slide 124 text

You can’t be serious!

Slide 125

Slide 125 text

You can’t be serious! • You’re thinking to yourself...

Slide 126

Slide 126 text

You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work.

Slide 127

Slide 127 text

You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work. • Our developers will never buy in.

Slide 128

Slide 128 text

You can’t be serious! • You’re thinking to yourself... • This sounds like a lot of work. • Our developers will never buy in. • Is there an easier way?

Slide 129

Slide 129 text

Duh.

Slide 130

Slide 130 text

No content

Slide 131

Slide 131 text

StatsD

Slide 132

Slide 132 text

StatsD • Created and released by Etsy

Slide 133

Slide 133 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything"

Slide 134

Slide 134 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything" • Aggregate counters and timers

Slide 135

Slide 135 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything" • Aggregate counters and timers • Pipeline to Graphite

Slide 136

Slide 136 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP)

Slide 137

Slide 137 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP) • Perl, PHP, Python and Java clients

Slide 138

Slide 138 text

StatsD • Created and released by Etsy • "Measure Anything, Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP) • Perl, PHP, Python and Java clients • https://github.com/etsy/statsd

Slide 139

Slide 139 text

StatsD

Slide 140

Slide 140 text

StatsD • https://github.com/sivy/statsd-client

Slide 141

Slide 141 text

StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric'); $c->timing('endpoint.customer.app.foo', 200);

Slide 142

Slide 142 text

StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric'); $c->timing('endpoint.customer.app.foo', 200); • Too much activity? Sample it!

Slide 143

Slide 143 text

StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric'); $c->timing('endpoint.customer.app.foo', 200); • Too much activity? Sample it! # sample 10%, StatsD will multiply it up $c->increment('endpoint.customer.app.metric', 0.1)

Slide 144

Slide 144 text

That’s enough?

Slide 145

Slide 145 text

Too much awesome?

Slide 146

Slide 146 text

Too Bad!

Slide 147

Slide 147 text

Logster

Slide 148

Slide 148 text

Logster • Yet Another Etsy Project (YAEP)

Slide 149

Slide 149 text

Logster • Yet Another Etsy Project (YAEP) • Rips metrics from log files

Slide 150

Slide 150 text

Logster • Yet Another Etsy Project (YAEP) • Rips metrics from log files • https://github.com/etsy/logster

Slide 151

Slide 151 text

Logster • Yet Another Etsy Project (YAEP) • Rips metrics from log files • https://github.com/etsy/logster # /usr/sbin/logster --dry-run --output=graphite \ --graphite-host=graphite.example.com:2003 \ SampleLogster /var/log/httpd/access_log

Slide 152

Slide 152 text

Questions?

Slide 153

Slide 153 text

Thank you