Upgrade to Pro — share decks privately, control downloads, hide ads and more …

stopwatch - let the data tell a story

Matthew Lyon
February 13, 2013

stopwatch - let the data tell a story

talk at the Portland Data Visualization group [1] on an internal tool I built for troubleshooting CloudFoundry installations across five datacenters on three continents.

[1]: https://groups.google.com/forum/?fromgroups#!forum/pdx-visualization

The type family used is Adobe's free Source Code Pro: http://store1.adobe.com/cfusion/store/html/index.cfm?event=displayFontPackage&code=1959

Matthew Lyon

February 13, 2013
Tweet

More Decks by Matthew Lyon

Other Decks in Programming

Transcript

  1. harnessing the power of the eye to help direct troubleshooting

    efforts across a distributed & service-oriented architecture stopwatch (will have a better name before open-sourcing. maybe Hermes or BatmanSonar or something)
  2. symptoms of a problem lots of things are randomly very

    slow or fail without a seeming connection to each other except that it tends to happen at the same time
  3. troubleshooting statsd/graphite + easy to store data + stores right

    data - hard to get at data - complex API - wasn’t helping me pinpoint the problem - doesn’t highlight relationships tailing log files + got a lot of raw data + helped understand the interaction between system components - cost sanity and time
  4. 1 month finding root cause gitolite on EBS 1 day

    writing code to replace gitolite at my in-laws. in Yakima, Washington. on Thanksgiving. 1 month convincing people I was right before high-risk deploy
  5. open-source Platform-as-a-Service toolkit created by VMWare launched with Ruby, Java,

    Node.js runtimes we contributed PHP runtime support it now also runs Python and Erlang (and if you wanted perl, it wouldn’t be hard to add) they built it to run on vSphere we run it on AWS and others cloudfoundry
  6. Hey it’s really slow right now. can you take a

    look? Sunday, 7:04am *sigh* on it.
  7. numbers are great, but you suck at stats especially if

    you’re not aware you do unless perhaps you’re German
  8. the human eye can quickly make sense of a lot

    of data but percentiles don’t tell the whole story and summaries lie too
  9. break out by facets appRegistry: resolve service via database dispatcher:

    import service on rackspace (oh and guess what? this one failed)
  10. we run the largest installations of cloudfoundry some bugs only

    manifest at the edges the only one run as a pay-for service on the public cloud (that is, AWS, Rackspace Cloud, etc)
  11. if you can’t measure from the inside then observe it

    from the outside cf’s deploy mechanism had timeout issues particularly with AWS/ELB and large apps
  12. drill into the buckets to figure out what’s wrong in

    this case, the culprit was a new edge case in creating java apps
  13. AWS US-East is having problems... again with EBS... again this

    is the AWS outage around Thanksgiving 2012 that took down half the internet
  14. Tufte’s three principles of data density: 1. Above all else,

    show the data 2. Maximize the data-to-ink ratio 3. Erase non-data ink basically, make every pixel mean something
  15. d3 + lots of good tools for simple data +

    almost an all-in-one solution + many prefab “layouts” + fluent interface (ie, jQuery) ? uses svg - data “joins” are a little weird - fluent interface isn’t always predictable svg inserts a dom node per shape if you’ve got >50k data points, consider...
  16. canvas + single dom-node - quickdraw-like API - lack of

    comprehensive docs - renders to pixels, no zooming also, blurry on Retina (first world problem, I know) - doesn’t necessarily give performance gains performance-tuning rendering changing render contexts is expensive