Slide 1

Slide 1 text

Unlocking the Potential of Data ................................................................................................................................

Slide 2

Slide 2 text

Karthik Kastury Software Engineer - PayPal India Engineer / Hacker / Blogger / Apple fanboi and evil minded. *Pun Intended* 24x7 Internet Geek | Unlucky in Cards ................................................................................................................................ ................................................................................................................................ Twitter Email [email protected] @KarthikDot Internet Addict for 12+ Years...

Slide 3

Slide 3 text

measure everything. http://codeascraft.com/2011/02/15/measure-anything-measure-everything/ ................................................................................................

Slide 4

Slide 4 text

.............................................................................................................. .............................................................................................................. Measure Everything! ‣User actions. (signups, login, other app actions) ‣Application Events (exceptions, deploys, perf. metrics, sql query perf.) ‣Business Metrics (active users, revenue) ‣Internal Metrics (build times, deploy times)

Slide 5

Slide 5 text

.............................................................................................................. .............................................................................................................. Why? ‣Cheap, Effective and Scalable. ‣Power to the Developers/Product Managers, to measure anything/everything they like. ‣As important as a QA team. Know what your app is doing in production.

Slide 6

Slide 6 text

.............................................................................................................. .............................................................................................................. Benefits ‣Total visibility into your app’s behavior in production. ‣No more “Hey it works fine on my machine!” ‣Feedback loops are extremely short, so the time between a change, and things going bad is is super *small* ‣Quickly find the needle in the haystack when things mess up!

Slide 7

Slide 7 text

StatsD https://github.com/etsy/statsd/ ................................................................................................

Slide 8

Slide 8 text

.............................................................................................................. .............................................................................................................. StatsD ‣Open Source Project from the good folks at Etsy. ‣Node.Js Server Implementation, with Support for multiple backends. Server implementation also in C++, Python, Ruby ‣Clients in *almost* every language.

Slide 9

Slide 9 text

.............................................................................................................. .............................................................................................................. What is StatsD? ‣A network daemon that listens for statistics and aggregates them on the fly. ‣Multiple types of metrics, counters, timers, gauges. ‣Default Backend is Graphite, but can send data downstream to all sorts of backends.

Slide 10

Slide 10 text

... .............................................................................................................. .............................................................................................................. Metrics examples accounts.auth.password.attempted accounts.auth.password.succeeded accounts.auth.password.failed accounts.auth.password.failure.invalid_email accounts.auth.password.failure.incorrect_password metric nesting accounts.auth.password.failure.*

Slide 11

Slide 11 text

.............................................................................................................. .............................................................................................................. Metrics Aggregation ‣StatsD flushes all the metrics in an aggregated fashion at periodic intervals. ‣The backend only receives the aggregated metrics. ‣Most Backends can further aggregate data into batches of time.

Slide 12

Slide 12 text

.............................................................................................................. .............................................................................................................. Backend Configuration ‣Configure your stats backend to capture mean, median, 75th, 90th, 95th and 99th percentile data. ‣Just capturing mean/median is generally not recommended. pro tip! ‣Don’t do DNS resolution while sending stats to StatsD or to the backend.

Slide 13

Slide 13 text

does it scale? ................................................................................................

Slide 14

Slide 14 text

yes ................................................................................................

Slide 15

Slide 15 text

.............................................................................................................. .............................................................................................................. Scalability ‣A Single StatsD Server can receive ten’s of thousands of events per second on a decently powered machine. ‣Horizontally Scalable. You can shard metrics depending on their namespace and send them to different StatsD instances. ‣StatsD receives metrics over UDP rather than TCP.

Slide 16

Slide 16 text

.............................................................................................................. .............................................................................................................. Metric Roll Ups ‣Graphite, and most other backends allow you to roll up data as you go to lower precisions. ‣StatsD by default flushes aggregated metrics to the backend every 10s. ‣Here’s my roll up scheme. 10s, 1min, 5min, 60min, 24hr, 30days, 1year

Slide 17

Slide 17 text

.............................................................................................................. .............................................................................................................. Metrics Backends ‣You can send data to one or multiple backends from StatsD. ‣Graphite is the most popular and the default backend for StatsD. http://graphite.wikidot.com/ ‣Commercial alternatives like InstrumentalApp, Librato Metrics exist. https://instrumentalapp.com/ https://metrics.librato.com/

Slide 18

Slide 18 text

measure everything. ................................................................................................

Slide 19

Slide 19 text

measure everything, everytime. ................................................................................................

Slide 20

Slide 20 text

data visualization Dashboards Alerts ................................................................................................

Slide 21

Slide 21 text

.............................................................................................................. .............................................................................................................. Dashboards? Graphs? Librato Metrics Dashboard - RubyGems Bundler API https://metrics.librato.com/share/dashboards/k4b5bhm8

Slide 22

Slide 22 text

.............................................................................................................. .............................................................................................................. Dashboards (Open Source) ‣Graphite allows to build dashboards fairly easily, combining different metrics in any way you want. ‣But Graphite’s Graphs are kinda boring! ‣Whole lot of open source, dashboard/graphing solutions that work with Graphite exist.

Slide 23

Slide 23 text

Graphite Giraffe Graphiti Tasseo Dashing GDash Graphene graphite.wikidot.com/start giraffe.kenhub.com github.com/paperlesspost/graphiti github.com/obfuscurity/tasseo shopify.github.io/dashing github.com/ripienaar/gdash jondot.github.io/graphene .............................................................................................................. ..............................................................................................................

Slide 24

Slide 24 text

.............................................................................................................. .............................................................................................................. Dashboards (Commercial) ‣StatsD supports various other backends to which it send data to. ‣InstrumentalApp is a wonderful choice, if you quickly want to build dashboards. https://instrumentalapp.com/ ‣Librato Metrics is *even* more awesome. Dashboards + Alerts and Notifications via Email / PagerDuty etc. https://metrics.librato.com/

Slide 25

Slide 25 text

realtime versus delayed ................................................................................................

Slide 26

Slide 26 text

.............................................................................................................. .............................................................................................................. Realtime ‣Extra pressure. You tend to over measure every small change. ‣Never take product decisions with a few hours worth of data. Wait for a few days for patterns to kick in. ‣Realtime makes sense for key business metrics, application metrics, infrastructure. ‣Divorce Product and Operational Metrics.

Slide 27

Slide 27 text

.............................................................................................................. .............................................................................................................. Delayed ‣Product Metrics are best observed over a period of time. ‣Capture them in realtime, but make change decisions only after a significant amount of data has been collected. ‣Realtime makes sense, when things go south!

Slide 28

Slide 28 text

realtime versus delayed Database Metrics Application Metrics Errors and Exceptions Other Operational Metrics Revenue User Conversion Key Business Metrics Other Product Metrics divorce product & operational metrics!

Slide 29

Slide 29 text

success stories ................................................................................................

Slide 30

Slide 30 text

.............................................................................................................. .............................................................................................................. Success Stories ‣ A major part of metric collection at Etsy happens via StatsD. All kinds of operational & product use cases. ‣ Shopify uses StatsD to figure out how their internal caching systems are performing, and to debug issues. ‣ 37Signals the folks behind Basecamp have a custom Golang implementation of StatsD Server to capture app metrics. Doing about 150k metrics/sec.

Slide 31

Slide 31 text

Test Driven Development ................................................................................................

Slide 32

Slide 32 text

Behavior Driven Development ................................................................................................ Test Driven Development

Slide 33

Slide 33 text

Documentation Driven Development ................................................................................................ Behavior Driven Development ................................................................................................ Test Driven Development Documentation Driven Development

Slide 34

Slide 34 text

Documentation Driven Development ................................................................................................ Behavior Driven Development ................................................................................................ Test Driven Development Documentation Driven Development Metrics Driven Development * * I didn’t invent this!

Slide 35

Slide 35 text

measure everything, everytime. ................................................................................................

Slide 36

Slide 36 text

................................................................................................ thank you! twitter email [email protected] [email protected] @KarthikDot slides https://speakerdeck.com/karthikdot/unlocking-the-potential-of-data http://bit.ly/5elunlock2013