Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring distributed systems with Google Big ...

Dale Humby
October 04, 2014

Monitoring distributed systems with Google Big Query and R

This talk explores continuous delivery, and how Nomanini monitors daily firmware upgrades of 1000 devices spread across Africa, using Google Big Query and the open source stats package, R.

Presented at GDG Stellenbosch.
Accompanying blog post at http://goo.gl/Cwai02

Dale Humby

October 04, 2014
Tweet

More Decks by Dale Humby

Other Decks in Programming

Transcript

  1. GDE Monitoring distributed systems with Google Big Query and R

    Dale Humby CTO, Nomanini Google Developer Expert for Cloud Platform
  2. GDE

  3. GDE

  4. GDE

  5. GDE

  6. GDE

  7. GDE

  8. JSON GDE Event counters { "counts": { "ERROR": 7, "WARNING":

    1475, "DEBUG": 362754, "[E].EventsManager.423": 2, "[E].GPSManager.259": 1, "[E].SlaveCommsDispatcher.158": 4 }, "firmwareVersion": "4264-548923b591c6", "startTime": "2014-09-22 00:00:01.152", "endTime": "2014-09-23 00:00:06.574" }
  9. JSON GDE Event counters { "counts": { "SignalStrength.Percent.20-40": 18, "SignalStrength.Percent.40-60":

    12, "SignalStrength.Percent.60-80": 15, "SignalStrength.Percent.80-100": 1, ... "GPRS.TimeToConnect.Seconds.0-20": 2 }, ... Histogram and timing
  10. JSON GDE Event counters { "counts": { "ERROR": 7, "WARNING":

    1475, "DEBUG": 362754, "[E].EventsManager.423": 2, "[E].GPSManager.259": 1, "[E].SlaveCommsDispatcher.158": 4 }, "firmwareVersion": "4264-548923b591c6", "startTime": "2014-09-22 00:00:01.152", "endTime": "2014-09-23 00:00:06.574" }
  11. SQL GDE Create View SELECT JSON_EXTRACT_SCALAR(event_data, '$.firmwareVersion') AS firmware_version, device_id,

    SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.ERROR'))) AS errors, SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.DEBUG'))) AS debugs, 1e6 * SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data,'$.counts.ERROR'))) / SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.DEBUG'))) AS error_rate FROM [nomanini.event_log] WHERE event = 'Counters' GROUP BY firmware_version, device_id ORDER BY firmware_version DESC, device_id DESC;
  12. R GDE Broken Beta release > t.test(log10(error_rate.beta), log10(error_rate.stable), paired=TRUE) Paired

    t-test data: log10(error_rate.beta) and log10(error_rate.stable) t = 2.8624, df = 28, p-value = 0.007872 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 0.1131325 0.6825117 sample estimates: mean of the differences 0.3978221 Geometric mean of the differences (base 10): 2.499321
  13. R GDE Broken Beta release > wilcox.test(error_rate.beta, error_rate.stable, paired=TRUE, conf.int=TRUE)

    Wilcoxon signed rank test data: error_rate.beta and error_rate.stable V = 318, p-value = 0.02906 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: 7.075123 14546.761849 sample estimates: (pseudo)median 183.0312
  14. JSON GDE Event counters on broken Beta { "counts": {

    "ERROR": 1937, "WARNING": 1427, "DEBUG": 26319, "[E].SlaveUpdateManager.442": 1912, "[E].HTTPSManager.319": 3, "[E].DOTAManager.511": 1, ... }, "masterVersion": "3976-aff309f9d073", "startTime": "2014-07-09 09:33:57.923", "endTime": "2014-07-09 15:02:37.032" }
  15. GDE