Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Finding Meaning in Operational Data

Finding Meaning in Operational Data

Overview of what operational data is, where to start, and how to leverage it.

Brad Lhotsky

June 20, 2017
Tweet

More Decks by Brad Lhotsky

Other Decks in Technology

Transcript

  1. Who Am I? • Systems and Security
 at Craigslist •

    Infrastructure Monitoring & Security
 at Booking.com • Recovering • Perl Programmer • Linux/BSD Systems Admin • Network Security Specialist • PostgreSQL Administrator • ElasticSearch Janitor • DNS Voyeur • OSSEC Core Team Member
  2. Administrative & Meta-Data ‣ Inventory ‣ Hardware ‣ Software ‣

    Builds or Roles ‣ Services ‣ Users and Groups ‣ Employee/Contractor ‣ Managers ‣ ACLs
  3. Monitoring ‣ State ‣ OK / NOT OK ‣ UP

    / DOWN ‣ Package Version ‣ Time Series ‣ Counter ‣ Rate ‣ Statistical Summaries
  4. Events ‣ State Changes ‣ Package Updated ‣ Service Stopped

    ‣ System Events ‣ Syslog Message ‣ SNMP Traps ‣ Application Events ‣ Access ‣ Errors ‣ Traces
  5. -Nicole Forsgren - Monitorama PDX 2016 How Metrics Shape Your

    Culture “Metrics are your culture.”
  6. "How Metrics Shape Your Culture" • You can't improve what

    you don't measure • Always measure things that matter • Things measured are things managed • Metrics can be gamed • Metrics inform incentives • Not everything that can be counted counts • Hard to measure doesn't mean it isn't worth measuring
  7. Automation ‣ Can a Machine read my data? ‣ Autoscale

    ‣ Trend detection ‣ Service Level Roll Ups ‣ Reporting
  8. Alerting ‣ Disrupting People's Lives at 95% Disk Full ‣

    Thresholds -> Change Detection ‣ State Changes
  9. Anomaly Detection • Anomalies != Alerts • 1 million metrics

    per minute • 0.3% at a distance of > 3σ • 4.3 Million Anomalies / day
  10. Statistics Sidebar • Your data probably isn't normal • You

    need to use modeling to perform anomaly detection • The residuals should fit a normal distribution • Modeling is only possible if you can explore and interact with the data • There are algorithms and their parameters matter
  11. Open and Extensible ‣ Integrations with other projects ‣ Open

    API ‣ Good Documentation ‣ Modular / Plugin Structure ‣ Community
  12. Retention ‣ How easy is it to age data off?

    ‣ What regulations of laws apply to the data? ‣ Expectations from: ‣ Customers ‣ Employees ‣ Managers ‣ Peers ‣ Legal
  13. Privacy and Security ‣ What do you keep on your

    users in your ops data? ‣ Who might come calling for it? ‣ How comfortable are you handing it over to Trump? ‣ Anyone hear about MongoDB? ‣ Can the store provide RBAC?
  14. ‣ Large Community ‣ Forks and Oracle ‣ Performance First

    ‣ SQL Interface ‣ Limited Data Types ‣ Web > BI ‣ Suitable for Administrative Data
  15. ‣ Large Community ‣ Reliability First ‣ Open and Extensible

    ‣ PGXN ‣ CitusData ‣ GreenPlum ‣ EnterpriseDB ‣ Native Support for IP Addresses ‣ Extensible Data Types ‣ Suitable for Administrative Data
  16. ‣ Large Community ‣ Interchangeable Components, ala, MicroServices ‣ Simple

    API ‣ Rampant Open Source Adoption ‣ Scalable ‣ Compatibility ‣ Grafana, Statsd, Riemann, Bosun, Cabot, Seyren ‣ etc., etc., ‣ Suitable for Time Series Data ‣ Smallest Resolution: seconds
  17. ‣ Metrics 2.0 (metrics support tagging) ‣ Hadoop / Hbase

    backed ‣ SQL-like Language ‣ Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: milliseconds
  18. ‣ Metrics 2.0 (metrics support tagging) ‣ SQL-like Language ‣

    Zero Data Loss ‣ Compatibility ‣ Carbon, Grafana, Statsd, Riemann, Bosun ‣ Suitable for Time Series Data ‣ Smallest Resolution: nanoseconds
  19. ‣ Well Documented API ‣ Many Open Source Integrations ‣

    Lucene backed text search ‣ Scalable ‣ "Jepsen ElasticSearch" re:CAP
  20. Data Types Meta-Data State Time Series Events Graphite No No

    Yes Kinda InfluxDB Kinda Yes Yes Kinda OpenTSDB Kinda Yes Yes Kinda MySQL Yes Yes No Kinda PostgreSQL Yes Yes No Kinda ElasticSearch No Kinda Kinda Yes
  21. Features of Your Data Interval Cardinality Data Type Aging Graphite

    Fixed, Regular Low Numeric Roll up InfluxDB Fixed (best) Any High Any Configurable OpenTSDB Any High Numeric n/a MySQL Any Keys: Low Values: High Structured* None PostgreSQL Any Keys: Low Values: High Structured* None ElasticSearch Any Keys: Low Values: High Any None
  22. Features of the Store Security Scalability Performance Reliability Graphite Low

    High High* Medium InfluxDB Low High Medium High OpenTSDB Low High Low High MySQL Medium Medium Medium* High* PostgreSQL High Medium Medium* High ElasticSearch Low High High Low