Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trending with Purpose

Trending with Purpose

Given at the BaltimorePM user group in June 2011.

1f1a3879e40c9418252a5aec3aed31b2?s=128

Jason Dixon

May 15, 2012
Tweet

Transcript

  1. Trending with Purpose Jason Dixon

  2. Trending

  3. A general direction in which something is developing or changing.

  4. None
  5. Why do we trend?

  6. Planning for growth.

  7. Predicting or diagnosing failure.

  8. ... instead of finding out from your customer.

  9. Operational Questions

  10. Just because a host or service reponds, how do you

    know it’s working?
  11. If you haven’t measured good, how will you recognize bad?

  12. You don’t know what might break, so collect everything now.

  13. "Those who cannot remember the past are condemned to repeat

    it" George Santayana
  14. "I don't care if my servers are on fire as

    long as they're making me money" Business Owner
  15. How can we add Business Value?

  16. Start simple.

  17. Dance with the one who brought you.

  18. None
  19. Nagios

  20. Nagios • Fault Detection

  21. Nagios • Fault Detection • Notifications

  22. Nagios • Fault Detection • Notifications • Escalations

  23. Nagios • Fault Detection • Notifications • Escalations • Acknowledgements/Downtime

  24. Nagios • Fault Detection • Notifications • Escalations • Acknowledgements/Downtime

    • http://www.nagios.org/
  25. Nagios

  26. Nagios • What it does well:

  27. Nagios • What it does well: • Free

  28. Nagios • What it does well: • Free • Extensible

  29. Nagios • What it does well: • Free • Extensible

    • Plugins
  30. Nagios • What it does well: • Free • Extensible

    • Plugins • Configuration templates
  31. Nagios • What it does well: • Free • Extensible

    • Plugins • Configuration templates • Popular (lesser of all free evils)
  32. Nagios • What it does well: • Free • Extensible

    • Plugins • Configuration templates • Popular (lesser of all free evils) • Log metrics (“performance data”)
  33. Nagios

  34. Nagios • Where it sucks:

  35. Nagios • Where it sucks: • Interface

  36. Nagios • Where it sucks: • Interface • (Lack of)

    Scalability
  37. Nagios • Where it sucks: • Interface • (Lack of)

    Scalability • Promotes bad habits
  38. Nagios • Where it sucks: • Interface • (Lack of)

    Scalability • Promotes bad habits • Acknowledgements never expire
  39. Nagios • Where it sucks: • Interface • (Lack of)

    Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility
  40. Nagios • Where it sucks: • Interface • (Lack of)

    Scalability • Promotes bad habits • Acknowledgements never expire • Configuration (over-)flexibility • Flapping
  41. Nagios

  42. PNP4Nagios

  43. PNP4Nagios • Basic graphing & dashboard capabilities

  44. PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios

    performance data
  45. PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios

    performance data • Creates graphs with RRD
  46. PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios

    performance data • Creates graphs with RRD • Limited introspection/correlation
  47. PNP4Nagios • Basic graphing & dashboard capabilities • Retrieves Nagios

    performance data • Creates graphs with RRD • Limited introspection/correlation • http://www.pnp4nagios.org/
  48. PNP4Nagios

  49. Graphite

  50. Graphite • Metric storage

  51. Graphite • Metric storage • Complex graph creation

  52. Graphite • Metric storage • Complex graph creation • Web

    and “CLI” interfaces
  53. Graphite • Metric storage • Complex graph creation • Web

    and “CLI” interfaces • Created and released by Orbitz.com
  54. Graphite • Metric storage • Complex graph creation • Web

    and “CLI” interfaces • Created and released by Orbitz.com • http://graphite.wikidot.com/
  55. Graphite

  56. Graphite

  57. Graphite • The good stuff:

  58. Graphite • The good stuff: • Horizontally scalable

  59. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI)
  60. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI) • Graph disparate data points
  61. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI) • Graph disparate data points • Numerous formulas available
  62. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc...
  63. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users
  64. Graphite • The good stuff: • Horizontally scalable • Rapid

    graph prototyping (CLI) • Graph disparate data points • Numerous formulas available • derive, transform, average, sum, etc... • Share graphs with other users • Supports existing RRD databases
  65. Graphite

  66. Graphite • The not-so-good stuff:

  67. Graphite • The not-so-good stuff: • Not a dashboard (well,

    sorta)
  68. Graphite • The not-so-good stuff: • Not a dashboard (well,

    sorta) • No hover details
  69. Graphite • The not-so-good stuff: • Not a dashboard (well,

    sorta) • No hover details • Single y-axis
  70. Graphite Whisper Carbon Web Metrics

  71. Carbon

  72. Carbon • agent - starts other daemons, receives metrics and

    pipelines them to cache
  73. Carbon • agent - starts other daemons, receives metrics and

    pipelines them to cache • aggregator - aggregate and transform your data before storage
  74. Carbon • agent - starts other daemons, receives metrics and

    pipelines them to cache • aggregator - aggregate and transform your data before storage • cache - caches metrics for real-time graphing, pipelines them to persister
  75. Carbon • agent - starts other daemons, receives metrics and

    pipelines them to cache • aggregator - aggregate and transform your data before storage • cache - caches metrics for real-time graphing, pipelines them to persister • persister - writes persistent data to disk
  76. Whisper

  77. Whisper • Metrics database format

  78. Whisper • Metrics database format • Supplanted RRDtool

  79. Whisper • Metrics database format • Supplanted RRDtool • Accepts

    out-of-order data
  80. Whisper • Metrics database format • Supplanted RRDtool • Accepts

    out-of-order data • Supports pipelining of data in a single operation (multiplexing)
  81. Coming Soon - Ceres

  82. Coming Soon - Ceres • Smaller files - doesn’t pad

    missing datapoints
  83. Coming Soon - Ceres • Smaller files - doesn’t pad

    missing datapoints • Doesn’t store timestamps, calculates them
  84. Coming Soon - Ceres • Smaller files - doesn’t pad

    missing datapoints • Doesn’t store timestamps, calculates them • Federates individual datapoints
  85. Graphite Web

  86. Graphite Web • Traditional web interface

  87. Graphite Web • Traditional web interface • Javascript CLI

  88. Graphite Web • Traditional web interface • Javascript CLI •

    Rudimentary dashboard
  89. Graphite Web • Traditional web interface • Javascript CLI •

    Rudimentary dashboard • Django application
  90. Sending metrics to Graphite

  91. Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

  92. Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

    • Send your data
  93. Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

    • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n");
  94. Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

    • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n"); • ???
  95. Sending metrics to Graphite • Connect to Carbon socket (tcp/2003)

    • Send your data my $sock = IO::Socket::INET->new(‘127.0.0.1:2003’); $sock->send(“endpoint.app.metric $value $epoch\n"); • ??? • Profit!
  96. What should we collect?

  97. App/DB Profiling

  98. App/DB Profiling • How fast is our:

  99. App/DB Profiling • How fast is our: • function foo()

    for each iteration
  100. App/DB Profiling • How fast is our: • function foo()

    for each iteration • SQL query
  101. App/DB Profiling • How fast is our: • function foo()

    for each iteration • SQL query • 3rd party API service (e.g. payment gateway, social media)
  102. App/DB Profiling

  103. App/DB Profiling • How many times do we:

  104. App/DB Profiling • How many times do we: • call

    function foo()
  105. App/DB Profiling • How many times do we: • call

    function foo() • register a new user
  106. App/DB Profiling • How many times do we: • call

    function foo() • register a new user • chargeback a sale
  107. IT exists to support Business.

  108. IT exists to support Business. DevOps

  109. Not the other way around.

  110. Business Profiling

  111. Business Profiling • How many sales did we generate this

    hour:
  112. Business Profiling • How many sales did we generate this

    hour: • per sku
  113. Business Profiling • How many sales did we generate this

    hour: • per sku • from the recent ad campaign
  114. Business Profiling • How many sales did we generate this

    hour: • per sku • from the recent ad campaign • from users in West Bumble, Arkansas
  115. Last week?

  116. Last month?

  117. Last year?

  118. Be Creative.

  119. More Data > Less Data

  120. You probably won’t know what metrics you might need until

    it’s too late.
  121. Don’t be that guy.

  122. Storage is Cheap.

  123. Data Sources • Nagios • Munin • SNMP • Ganglia

    • collectd • SQL • Logs • sar • /proc • REST APIs
  124. You can’t be serious!

  125. You can’t be serious! • You’re thinking to yourself...

  126. You can’t be serious! • You’re thinking to yourself... •

    This sounds like a lot of work.
  127. You can’t be serious! • You’re thinking to yourself... •

    This sounds like a lot of work. • Our developers will never buy in.
  128. You can’t be serious! • You’re thinking to yourself... •

    This sounds like a lot of work. • Our developers will never buy in. • Is there an easier way?
  129. Duh.

  130. None
  131. StatsD

  132. StatsD • Created and released by Etsy

  133. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything"
  134. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything" • Aggregate counters and timers
  135. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything" • Aggregate counters and timers • Pipeline to Graphite
  136. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP)
  137. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP) • Perl, PHP, Python and Java clients
  138. StatsD • Created and released by Etsy • "Measure Anything,

    Measure Everything" • Aggregate counters and timers • Pipeline to Graphite • Fire-and-forget (UDP) • Perl, PHP, Python and Java clients • https://github.com/etsy/statsd
  139. StatsD

  140. StatsD • https://github.com/sivy/statsd-client

  141. StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric');

    $c->timing('endpoint.customer.app.foo', 200);
  142. StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric');

    $c->timing('endpoint.customer.app.foo', 200); • Too much activity? Sample it!
  143. StatsD • https://github.com/sivy/statsd-client use Net::StatsD::Client; my $c = Net::StatsD::Client->new(); $c->increment('endpoint.customer.app.metric');

    $c->timing('endpoint.customer.app.foo', 200); • Too much activity? Sample it! # sample 10%, StatsD will multiply it up $c->increment('endpoint.customer.app.metric', 0.1)
  144. That’s enough?

  145. Too much awesome?

  146. Too Bad!

  147. Logster

  148. Logster • Yet Another Etsy Project (YAEP)

  149. Logster • Yet Another Etsy Project (YAEP) • Rips metrics

    from log files
  150. Logster • Yet Another Etsy Project (YAEP) • Rips metrics

    from log files • https://github.com/etsy/logster
  151. Logster • Yet Another Etsy Project (YAEP) • Rips metrics

    from log files • https://github.com/etsy/logster # /usr/sbin/logster --dry-run --output=graphite \ --graphite-host=graphite.example.com:2003 \ SampleLogster /var/log/httpd/access_log
  152. Questions?

  153. Thank you