Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Monitoring distributed systems with Google Big Query and R

Dale Humby
October 04, 2014

Monitoring distributed systems with Google Big Query and R

This talk explores continuous delivery, and how Nomanini monitors daily firmware upgrades of 1000 devices spread across Africa, using Google Big Query and the open source stats package, R.

Presented at GDG Stellenbosch.
Accompanying blog post at http://goo.gl/Cwai02

Dale Humby

October 04, 2014
Tweet

More Decks by Dale Humby

Other Decks in Programming

Transcript

  1. GDE
    Monitoring distributed systems
    with Google Big Query and R
    Dale Humby
    CTO, Nomanini
    Google Developer Expert for Cloud Platform

    View Slide

  2. View Slide

  3. micro-transactions in emerging markets

    View Slide

  4. GDE

    View Slide

  5. GDE

    View Slide

  6. GDE

    View Slide

  7. GDE

    View Slide

  8. GDE

    View Slide

  9. GDE
    Continuous Delivery

    View Slide

  10. GDE

    View Slide

  11. JSON
    GDE
    Event counters
    {
    "counts": {
    "ERROR": 7,
    "WARNING": 1475,
    "DEBUG": 362754,
    "[E].EventsManager.423": 2,
    "[E].GPSManager.259": 1,
    "[E].SlaveCommsDispatcher.158": 4
    },
    "firmwareVersion": "4264-548923b591c6",
    "startTime": "2014-09-22 00:00:01.152",
    "endTime": "2014-09-23 00:00:06.574"
    }

    View Slide

  12. JSON
    GDE
    Event counters
    {
    "counts": {
    "SignalStrength.Percent.20-40": 18,
    "SignalStrength.Percent.40-60": 12,
    "SignalStrength.Percent.60-80": 15,
    "SignalStrength.Percent.80-100": 1,
    ...
    "GPRS.TimeToConnect.Seconds.0-20": 2
    },
    ...
    Histogram and timing

    View Slide

  13. GDE
    Upload and stream to Big Query

    View Slide

  14. GDE
    Diagnostics saved in Big Query

    View Slide

  15. JSON
    GDE
    Event counters
    {
    "counts": {
    "ERROR": 7,
    "WARNING": 1475,
    "DEBUG": 362754,
    "[E].EventsManager.423": 2,
    "[E].GPSManager.259": 1,
    "[E].SlaveCommsDispatcher.158": 4
    },
    "firmwareVersion": "4264-548923b591c6",
    "startTime": "2014-09-22 00:00:01.152",
    "endTime": "2014-09-23 00:00:06.574"
    }

    View Slide

  16. SQL
    GDE
    Create View
    SELECT
    JSON_EXTRACT_SCALAR(event_data, '$.firmwareVersion')
    AS firmware_version,
    device_id,
    SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.ERROR')))
    AS errors,
    SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data, '$.counts.DEBUG')))
    AS debugs,
    1e6 *
    SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data,'$.counts.ERROR'))) /
    SUM(INTEGER(JSON_EXTRACT_SCALAR(event_data,
    '$.counts.DEBUG')))
    AS error_rate
    FROM [nomanini.event_log]
    WHERE event = 'Counters'
    GROUP BY firmware_version, device_id
    ORDER BY firmware_version DESC, device_id DESC;

    View Slide

  17. GDE
    SELECT * FROM [nomanini.firmware_error_rates] LIMIT 10;
    Query View

    View Slide

  18. GDE
    Stats with R

    View Slide

  19. GDE
    Good Beta release

    View Slide

  20. GDE
    Log transform data
    to make more ‘Normal’

    View Slide

  21. GDE
    Broken Beta release

    View Slide

  22. R
    GDE
    Broken Beta release
    > t.test(log10(error_rate.beta),
    log10(error_rate.stable),
    paired=TRUE)
    Paired t-test
    data: log10(error_rate.beta) and log10(error_rate.stable)
    t = 2.8624, df = 28, p-value = 0.007872
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
    0.1131325 0.6825117
    sample estimates:
    mean of the differences
    0.3978221
    Geometric mean of the differences (base 10): 2.499321

    View Slide

  23. R
    GDE
    Broken Beta release
    > wilcox.test(error_rate.beta,
    error_rate.stable,
    paired=TRUE, conf.int=TRUE)
    Wilcoxon signed rank test
    data: error_rate.beta and error_rate.stable
    V = 318, p-value = 0.02906
    alternative hypothesis: true location shift is not equal to 0
    95 percent confidence interval:
    7.075123 14546.761849
    sample estimates:
    (pseudo)median
    183.0312

    View Slide

  24. GDE
    Broken Beta release

    View Slide

  25. JSON
    GDE
    Event counters on broken Beta
    {
    "counts": {
    "ERROR": 1937,
    "WARNING": 1427,
    "DEBUG": 26319,
    "[E].SlaveUpdateManager.442": 1912,
    "[E].HTTPSManager.319": 3,
    "[E].DOTAManager.511": 1,
    ...
    },
    "masterVersion": "3976-aff309f9d073",
    "startTime": "2014-07-09 09:33:57.923",
    "endTime": "2014-07-09 15:02:37.032"
    }

    View Slide

  26. GDE
    Fixed Beta release

    View Slide

  27. GDE
    The Future

    View Slide

  28. GDE

    View Slide

  29. GDE
    Thank You
    [email protected]
    google.com/+DaleHumby
    @dalehumby
    http://goo.gl/Cwai02

    View Slide

  30. Experts
    Developer
    cloud.google.com
    $500 promo code: gde-in

    View Slide