Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DevOps: Why Analytics Fail

DevOps: Why Analytics Fail

Are you plagued by false alerts? Is your monitoring system asleep at the switch when real “situations” occur? Do you feel like your analytics are letting your down? This presentation explores why analytics fail; but also how you can implement monitoring strategies where they succeed.

Elizabeth Nichols

May 03, 2016
Tweet

More Decks by Elizabeth Nichols

Other Decks in Technology

Transcript

  1. www.netuitive.com 2 The stories you are about to hear are

    true. The names have been changed to protect the innocent. @eanTweet
  2. www.netuitive.com 4 Market Visibility Time Hype Cycle: Analytics for DevOps

    Analytics for DevOps Hyperbole Index 2016 @eanTweet
  3. www.netuitive.com 10 Types of Analytics • Off-line analytics (“Reporting”) o  Trends

    over hours, weeks, or months o  Optimization strategies o  Recommendations o  Business intelligence • Hybrid • Near real-time analytics (“Monitoring”) o  Detection o  Troubleshooting o  Remediation @eanTweet
  4. www.netuitive.com 12 Report: ASG Capacity vs Utilization ASG Group: #

    Nodes Provisioned 15 ASG Group: 95% Percentile CPU Utilization Time
  5. www.netuitive.com 13 Report: ASG Capacity vs Utilization ASG Group: #

    Nodes Provisioned 15 ASG Group: 95% Percentile CPU Utilization Time $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $
  6. www.netuitive.com 20 + Static Thresholds + Counting & Transformations +

    Eyeballing Dashboards @eanTweet Common Deterministic Analytics
  7. www.netuitive.com 23 False Negative: EC2 Cost Explosion Duration: 13 hours

    Cost: $thousands Cost Per Hour Hour Anomalies unnoticed due to lack of automation @eanTweet Credentials revoked Credentials stolen
  8. www.netuitive.com 28 Counting + Transformations (Uni-Variate) •  Delta: raw[n] à

    (raw[n] – raw[n-1]) •  Rate: raw[n] à (raw[n] / time) •  Scale: raw[n] à (raw[n] * constant) •  Min: raw[n] à min(raw[…]) •  Max: raw[n] à max(raw[…]) •  RHMAX: raw[n] à (raw[n] / max(raw)) •  Frequency: range(x) à # observations @eanTweet
  9. www.netuitive.com 33 Statistical Analytics Assumption: *The Tempest by William Shakespeare,

    Act II, Scene I “What’s past is prologue*” @eanTweet
  10. www.netuitive.com 34 + Correlation Models 1 0.8 0.4 0 -0.4

    -0.8 -1 + Machine Learning @eanTweet
  11. www.netuitive.com 35 Correlation t1 t3 t1 t3 t4 t5 t1

    t2 t3 t4 t5 t7 t6 t1 t2 t3 t4 t5 t7 t6 Confidence Interval = x% Revenues/sec Requests/sec @eanTweet
  12. www.netuitive.com 36 Correlation Analytic r = ((x i −µx )(y

    i −µy )) i=1 n ∑ (x i −µx )2 i=1 n ∑ (y i −µy )2 i=1 n ∑ Pearson Product Moment Coefficient of Correlation for two metrics X and Y @eanTweet
  13. www.netuitive.com 45 Learned Bands of Normalcy Time 05/01 05/02 05/03

    05/04 05/06 05/07 05/08 200 150 100 50 0 Values = Raw Metric Observations Xi ! σn (Xi | X j ) ! σn (X W i ) @eanTweet multi-variate uni-variate “Bands of Normalcy”
  14. www.netuitive.com 47 Learning @ Work Deviation from Norm @eanTweet observed

    values uni-variate band of normalcy (narrowing)
  15. www.netuitive.com 49 Nuance: New Normal t2.small (1 vcpu) m4.large (2

    vcpu) CPU Utilization Save $$$ @eanTweet multi-variate uni-variate observed values
  16. www.netuitive.com 50 Nuance: Bad Change Bad Good Response Time SLA

    Not Met multi-variate uni-variate observed values
  17. www.netuitive.com 52 Nuance: Memory Leak Algorithm: Only if no anomalies:

    1. Take sequence of relative minimum values 2. Fit a linear regression model. Test goodness of fit. 3. Use model to predict future relative min values 4. Alarm two hours relative min of heap used = 90% % Heap Used Garbage collection Garbage collection Garbage collection Garbage collection
  18. www.netuitive.com 54 Math is the Queen of the Sciences ….

    Impressive, but not enough. @eanTweet
  19. www.netuitive.com 62 Contact Info Elizabeth Nichols, Ph.D. Chief Data Scientist

    [email protected] @eantweet www.netuitive.com @netuitive (703) 464-1500 @eanTweet