Upgrade to Pro — share decks privately, control downloads, hide ads and more …

stopwatch - let the data tell a story

296146f0455f164a7798632412ce453a?s=47 Matthew Lyon
February 13, 2013

stopwatch - let the data tell a story

talk at the Portland Data Visualization group [1] on an internal tool I built for troubleshooting CloudFoundry installations across five datacenters on three continents.

[1]: https://groups.google.com/forum/?fromgroups#!forum/pdx-visualization

The type family used is Adobe's free Source Code Pro: http://store1.adobe.com/cfusion/store/html/index.cfm?event=displayFontPackage&code=1959

296146f0455f164a7798632412ce453a?s=128

Matthew Lyon

February 13, 2013
Tweet

Transcript

  1. harnessing the power of the eye to help direct troubleshooting

    efforts across a distributed & service-oriented architecture stopwatch (will have a better name before open-sourcing. maybe Hermes or BatmanSonar or something)
  2. Matthew Lyon @mattly AppFog Platform as a Service by and

    for developers
  3. PHP Fog our first product started as a prototype

  4. symptoms of a problem lots of things are randomly very

    slow or fail without a seeming connection to each other except that it tends to happen at the same time
  5. troubleshooting statsd/graphite + easy to store data + stores right

    data - hard to get at data - complex API - wasn’t helping me pinpoint the problem - doesn’t highlight relationships tailing log files + got a lot of raw data + helped understand the interaction between system components - cost sanity and time
  6. 1 month finding root cause gitolite on EBS 1 day

    writing code to replace gitolite at my in-laws. in Yakima, Washington. on Thanksgiving. 1 month convincing people I was right before high-risk deploy
  7. open-source Platform-as-a-Service toolkit created by VMWare launched with Ruby, Java,

    Node.js runtimes we contributed PHP runtime support it now also runs Python and Erlang (and if you wanted perl, it wouldn’t be hard to add) they built it to run on vSphere we run it on AWS and others cloudfoundry
  8. cloudfoundry

  9. we run five of those on three continents

  10. Hey it’s really slow right now. can you take a

    look? Sunday, 7:04am *sigh* on it.
  11. observe & measure hey, there’s (one of the) problem(s)!

  12. numbers are great, but you suck at stats especially if

    you’re not aware you do unless perhaps you’re German
  13. averages lie especially in comparisons failures times what you want

    is the distribution
  14. the human eye can quickly make sense of a lot

    of data but percentiles don’t tell the whole story and summaries lie too
  15. None
  16. break out by facets appRegistry: resolve service via database dispatcher:

    import service on rackspace (oh and guess what? this one failed)
  17. <demo>

  18. we run the largest installations of cloudfoundry some bugs only

    manifest at the edges the only one run as a pay-for service on the public cloud (that is, AWS, Rackspace Cloud, etc)
  19. if you can’t measure from the inside then observe it

    from the outside cf’s deploy mechanism had timeout issues particularly with AWS/ELB and large apps
  20. make it obvious that something is wrong from across the

    room EBS... again
  21. the site is unresponsive? failures gonna propagate

  22. cloudfoundry academy, lesson 1: uncaught exceptions will kill you

  23. summarize into buckets to help find the pain points

  24. drill into the buckets to figure out what’s wrong in

    this case, the culprit was a new edge case in creating java apps
  25. staying on top of network problems

  26. None
  27. AWS US-East is having problems... again with EBS... again this

    is the AWS outage around Thanksgiving 2012 that took down half the internet
  28. None
  29. None
  30. None
  31. a quick tour of how I learned to draw

  32. Tufte’s three principles of data density: 1. Above all else,

    show the data 2. Maximize the data-to-ink ratio 3. Erase non-data ink basically, make every pixel mean something
  33. d3 + lots of good tools for simple data +

    almost an all-in-one solution + many prefab “layouts” + fluent interface (ie, jQuery) ? uses svg - data “joins” are a little weird - fluent interface isn’t always predictable svg inserts a dom node per shape if you’ve got >50k data points, consider...
  34. canvas + single dom-node - quickdraw-like API - lack of

    comprehensive docs - renders to pixels, no zooming also, blurry on Retina (first world problem, I know) - doesn’t necessarily give performance gains performance-tuning rendering changing render contexts is expensive
  35. invert data by style use beginPath() and fill() sparingly (yes,

    this is coffeescript)