Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trending with Purpose

Trending with Purpose

Given at the BaltimorePM user group in June 2011.

Jason Dixon

May 15, 2012
Tweet

More Decks by Jason Dixon

Other Decks in Programming

Transcript

  1. Trending with Purpose
    Jason Dixon

    View Slide

  2. Trending

    View Slide

  3. A general direction in which something is
    developing or changing.

    View Slide

  4. View Slide

  5. Why do we trend?

    View Slide

  6. Planning for growth.

    View Slide

  7. Predicting or
    diagnosing failure.

    View Slide

  8. ... instead of finding out
    from your customer.

    View Slide

  9. Operational Questions

    View Slide

  10. Just because a host or service reponds,
    how do you know it’s working?

    View Slide

  11. If you haven’t measured good, how will
    you recognize bad?

    View Slide

  12. You don’t know what might break, so
    collect everything now.

    View Slide

  13. "Those who cannot remember the
    past are condemned to repeat it"
    George Santayana

    View Slide

  14. "I don't care if my servers are on fire as
    long as they're making me money"
    Business Owner

    View Slide

  15. How can we add
    Business Value?

    View Slide

  16. Start simple.

    View Slide

  17. Dance with the one who
    brought you.

    View Slide

  18. View Slide

  19. Nagios

    View Slide

  20. Nagios
    • Fault Detection

    View Slide

  21. Nagios
    • Fault Detection
    • Notifications

    View Slide

  22. Nagios
    • Fault Detection
    • Notifications
    • Escalations

    View Slide

  23. Nagios
    • Fault Detection
    • Notifications
    • Escalations
    • Acknowledgements/Downtime

    View Slide

  24. Nagios
    • Fault Detection
    • Notifications
    • Escalations
    • Acknowledgements/Downtime
    • http://www.nagios.org/

    View Slide

  25. Nagios

    View Slide

  26. Nagios
    • What it does well:

    View Slide

  27. Nagios
    • What it does well:
    • Free

    View Slide

  28. Nagios
    • What it does well:
    • Free
    • Extensible

    View Slide

  29. Nagios
    • What it does well:
    • Free
    • Extensible
    • Plugins

    View Slide

  30. Nagios
    • What it does well:
    • Free
    • Extensible
    • Plugins
    • Configuration templates

    View Slide

  31. Nagios
    • What it does well:
    • Free
    • Extensible
    • Plugins
    • Configuration templates
    • Popular (lesser of all free evils)

    View Slide

  32. Nagios
    • What it does well:
    • Free
    • Extensible
    • Plugins
    • Configuration templates
    • Popular (lesser of all free evils)
    • Log metrics (“performance data”)

    View Slide

  33. Nagios

    View Slide

  34. Nagios
    • Where it sucks:

    View Slide

  35. Nagios
    • Where it sucks:
    • Interface

    View Slide

  36. Nagios
    • Where it sucks:
    • Interface
    • (Lack of) Scalability

    View Slide

  37. Nagios
    • Where it sucks:
    • Interface
    • (Lack of) Scalability
    • Promotes bad habits

    View Slide

  38. Nagios
    • Where it sucks:
    • Interface
    • (Lack of) Scalability
    • Promotes bad habits
    • Acknowledgements never expire

    View Slide

  39. Nagios
    • Where it sucks:
    • Interface
    • (Lack of) Scalability
    • Promotes bad habits
    • Acknowledgements never expire
    • Configuration (over-)flexibility

    View Slide

  40. Nagios
    • Where it sucks:
    • Interface
    • (Lack of) Scalability
    • Promotes bad habits
    • Acknowledgements never expire
    • Configuration (over-)flexibility
    • Flapping

    View Slide

  41. Nagios

    View Slide

  42. PNP4Nagios

    View Slide

  43. PNP4Nagios
    • Basic graphing & dashboard capabilities

    View Slide

  44. PNP4Nagios
    • Basic graphing & dashboard capabilities
    • Retrieves Nagios performance data

    View Slide

  45. PNP4Nagios
    • Basic graphing & dashboard capabilities
    • Retrieves Nagios performance data
    • Creates graphs with RRD

    View Slide

  46. PNP4Nagios
    • Basic graphing & dashboard capabilities
    • Retrieves Nagios performance data
    • Creates graphs with RRD
    • Limited introspection/correlation

    View Slide

  47. PNP4Nagios
    • Basic graphing & dashboard capabilities
    • Retrieves Nagios performance data
    • Creates graphs with RRD
    • Limited introspection/correlation
    • http://www.pnp4nagios.org/

    View Slide

  48. PNP4Nagios

    View Slide

  49. Graphite

    View Slide

  50. Graphite
    • Metric storage

    View Slide

  51. Graphite
    • Metric storage
    • Complex graph creation

    View Slide

  52. Graphite
    • Metric storage
    • Complex graph creation
    • Web and “CLI” interfaces

    View Slide

  53. Graphite
    • Metric storage
    • Complex graph creation
    • Web and “CLI” interfaces
    • Created and released by Orbitz.com

    View Slide

  54. Graphite
    • Metric storage
    • Complex graph creation
    • Web and “CLI” interfaces
    • Created and released by Orbitz.com
    • http://graphite.wikidot.com/

    View Slide

  55. Graphite

    View Slide

  56. Graphite

    View Slide

  57. Graphite
    • The good stuff:

    View Slide

  58. Graphite
    • The good stuff:
    • Horizontally scalable

    View Slide

  59. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)

    View Slide

  60. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)
    • Graph disparate data points

    View Slide

  61. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)
    • Graph disparate data points
    • Numerous formulas available

    View Slide

  62. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)
    • Graph disparate data points
    • Numerous formulas available
    • derive, transform, average, sum, etc...

    View Slide

  63. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)
    • Graph disparate data points
    • Numerous formulas available
    • derive, transform, average, sum, etc...
    • Share graphs with other users

    View Slide

  64. Graphite
    • The good stuff:
    • Horizontally scalable
    • Rapid graph prototyping (CLI)
    • Graph disparate data points
    • Numerous formulas available
    • derive, transform, average, sum, etc...
    • Share graphs with other users
    • Supports existing RRD databases

    View Slide

  65. Graphite

    View Slide

  66. Graphite
    • The not-so-good stuff:

    View Slide

  67. Graphite
    • The not-so-good stuff:
    • Not a dashboard (well, sorta)

    View Slide

  68. Graphite
    • The not-so-good stuff:
    • Not a dashboard (well, sorta)
    • No hover details

    View Slide

  69. Graphite
    • The not-so-good stuff:
    • Not a dashboard (well, sorta)
    • No hover details
    • Single y-axis

    View Slide

  70. Graphite
    Whisper
    Carbon
    Web Metrics

    View Slide

  71. Carbon

    View Slide

  72. Carbon
    • agent - starts other daemons, receives metrics and
    pipelines them to cache

    View Slide

  73. Carbon
    • agent - starts other daemons, receives metrics and
    pipelines them to cache
    • aggregator - aggregate and transform your data
    before storage

    View Slide

  74. Carbon
    • agent - starts other daemons, receives metrics and
    pipelines them to cache
    • aggregator - aggregate and transform your data
    before storage
    • cache - caches metrics for real-time graphing,
    pipelines them to persister

    View Slide

  75. Carbon
    • agent - starts other daemons, receives metrics and
    pipelines them to cache
    • aggregator - aggregate and transform your data
    before storage
    • cache - caches metrics for real-time graphing,
    pipelines them to persister
    • persister - writes persistent data to disk

    View Slide

  76. Whisper

    View Slide

  77. Whisper
    • Metrics database format

    View Slide

  78. Whisper
    • Metrics database format
    • Supplanted RRDtool

    View Slide

  79. Whisper
    • Metrics database format
    • Supplanted RRDtool
    • Accepts out-of-order data

    View Slide

  80. Whisper
    • Metrics database format
    • Supplanted RRDtool
    • Accepts out-of-order data
    • Supports pipelining of data in a single operation
    (multiplexing)

    View Slide

  81. Coming Soon - Ceres

    View Slide

  82. Coming Soon - Ceres
    • Smaller files - doesn’t pad missing datapoints

    View Slide

  83. Coming Soon - Ceres
    • Smaller files - doesn’t pad missing datapoints
    • Doesn’t store timestamps, calculates them

    View Slide

  84. Coming Soon - Ceres
    • Smaller files - doesn’t pad missing datapoints
    • Doesn’t store timestamps, calculates them
    • Federates individual datapoints

    View Slide

  85. Graphite Web

    View Slide

  86. Graphite Web
    • Traditional web interface

    View Slide

  87. Graphite Web
    • Traditional web interface
    • Javascript CLI

    View Slide

  88. Graphite Web
    • Traditional web interface
    • Javascript CLI
    • Rudimentary dashboard

    View Slide

  89. Graphite Web
    • Traditional web interface
    • Javascript CLI
    • Rudimentary dashboard
    • Django application

    View Slide

  90. Sending metrics to Graphite

    View Slide

  91. Sending metrics to Graphite
    • Connect to Carbon socket (tcp/2003)

    View Slide

  92. Sending metrics to Graphite
    • Connect to Carbon socket (tcp/2003)
    • Send your data

    View Slide

  93. Sending metrics to Graphite
    • Connect to Carbon socket (tcp/2003)
    • Send your data
    my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’);
    $sock->send(“endpoint.app.metric $value $epoch\n");

    View Slide

  94. Sending metrics to Graphite
    • Connect to Carbon socket (tcp/2003)
    • Send your data
    my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’);
    $sock->send(“endpoint.app.metric $value $epoch\n");
    • ???

    View Slide

  95. Sending metrics to Graphite
    • Connect to Carbon socket (tcp/2003)
    • Send your data
    my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’);
    $sock->send(“endpoint.app.metric $value $epoch\n");
    • ???
    • Profit!

    View Slide

  96. What should we collect?

    View Slide

  97. App/DB Profiling

    View Slide

  98. App/DB Profiling
    • How fast is our:

    View Slide

  99. App/DB Profiling
    • How fast is our:
    • function foo() for each iteration

    View Slide

  100. App/DB Profiling
    • How fast is our:
    • function foo() for each iteration
    • SQL query

    View Slide

  101. App/DB Profiling
    • How fast is our:
    • function foo() for each iteration
    • SQL query
    • 3rd party API service (e.g. payment gateway,
    social media)

    View Slide

  102. App/DB Profiling

    View Slide

  103. App/DB Profiling
    • How many times do we:

    View Slide

  104. App/DB Profiling
    • How many times do we:
    • call function foo()

    View Slide

  105. App/DB Profiling
    • How many times do we:
    • call function foo()
    • register a new user

    View Slide

  106. App/DB Profiling
    • How many times do we:
    • call function foo()
    • register a new user
    • chargeback a sale

    View Slide

  107. IT exists to support
    Business.

    View Slide

  108. IT exists to support
    Business.
    DevOps

    View Slide

  109. Not the other way
    around.

    View Slide

  110. Business Profiling

    View Slide

  111. Business Profiling
    • How many sales did we generate this hour:

    View Slide

  112. Business Profiling
    • How many sales did we generate this hour:
    • per sku

    View Slide

  113. Business Profiling
    • How many sales did we generate this hour:
    • per sku
    • from the recent ad campaign

    View Slide

  114. Business Profiling
    • How many sales did we generate this hour:
    • per sku
    • from the recent ad campaign
    • from users in West Bumble, Arkansas

    View Slide

  115. Last week?

    View Slide

  116. Last month?

    View Slide

  117. Last year?

    View Slide

  118. Be Creative.

    View Slide

  119. More Data > Less Data

    View Slide

  120. You probably won’t know what metrics
    you might need until it’s too late.

    View Slide

  121. Don’t be that guy.

    View Slide

  122. Storage is Cheap.

    View Slide

  123. Data Sources
    • Nagios
    • Munin
    • SNMP
    • Ganglia
    • collectd
    • SQL
    • Logs
    • sar
    • /proc
    • REST APIs

    View Slide

  124. You can’t be serious!

    View Slide

  125. You can’t be serious!
    • You’re thinking to yourself...

    View Slide

  126. You can’t be serious!
    • You’re thinking to yourself...
    • This sounds like a lot of work.

    View Slide

  127. You can’t be serious!
    • You’re thinking to yourself...
    • This sounds like a lot of work.
    • Our developers will never buy in.

    View Slide

  128. You can’t be serious!
    • You’re thinking to yourself...
    • This sounds like a lot of work.
    • Our developers will never buy in.
    • Is there an easier way?

    View Slide

  129. Duh.

    View Slide

  130. View Slide

  131. StatsD

    View Slide

  132. StatsD
    • Created and released by Etsy

    View Slide

  133. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"

    View Slide

  134. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"
    • Aggregate counters and timers

    View Slide

  135. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"
    • Aggregate counters and timers
    • Pipeline to Graphite

    View Slide

  136. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"
    • Aggregate counters and timers
    • Pipeline to Graphite
    • Fire-and-forget (UDP)

    View Slide

  137. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"
    • Aggregate counters and timers
    • Pipeline to Graphite
    • Fire-and-forget (UDP)
    • Perl, PHP, Python and Java clients

    View Slide

  138. StatsD
    • Created and released by Etsy
    • "Measure Anything, Measure Everything"
    • Aggregate counters and timers
    • Pipeline to Graphite
    • Fire-and-forget (UDP)
    • Perl, PHP, Python and Java clients
    • https://github.com/etsy/statsd

    View Slide

  139. StatsD

    View Slide

  140. StatsD
    • https://github.com/sivy/statsd-client

    View Slide

  141. StatsD
    • https://github.com/sivy/statsd-client
    use Net::StatsD::Client;
    my $c = Net::StatsD::Client->new();
    $c->increment('endpoint.customer.app.metric');
    $c->timing('endpoint.customer.app.foo', 200);

    View Slide

  142. StatsD
    • https://github.com/sivy/statsd-client
    use Net::StatsD::Client;
    my $c = Net::StatsD::Client->new();
    $c->increment('endpoint.customer.app.metric');
    $c->timing('endpoint.customer.app.foo', 200);
    • Too much activity? Sample it!

    View Slide

  143. StatsD
    • https://github.com/sivy/statsd-client
    use Net::StatsD::Client;
    my $c = Net::StatsD::Client->new();
    $c->increment('endpoint.customer.app.metric');
    $c->timing('endpoint.customer.app.foo', 200);
    • Too much activity? Sample it!
    # sample 10%, StatsD will multiply it up
    $c->increment('endpoint.customer.app.metric', 0.1)

    View Slide

  144. That’s enough?

    View Slide

  145. Too much awesome?

    View Slide

  146. Too Bad!

    View Slide

  147. Logster

    View Slide

  148. Logster
    • Yet Another Etsy Project (YAEP)

    View Slide

  149. Logster
    • Yet Another Etsy Project (YAEP)
    • Rips metrics from log files

    View Slide

  150. Logster
    • Yet Another Etsy Project (YAEP)
    • Rips metrics from log files
    • https://github.com/etsy/logster

    View Slide

  151. Logster
    • Yet Another Etsy Project (YAEP)
    • Rips metrics from log files
    • https://github.com/etsy/logster
    # /usr/sbin/logster --dry-run --output=graphite \
    --graphite-host=graphite.example.com:2003 \
    SampleLogster /var/log/httpd/access_log

    View Slide

  152. Questions?

    View Slide

  153. Thank you

    View Slide