Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hard Earned Lessons in Observability, Security,...

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Hard Earned Lessons in Observability, Security, and Life

In the age of AI, this talk will be abnormally focused on the humans the machines are meant to serve. We'll look at successful security and observability solutions and how they map to the human behind the keyboard. Learn from more than two decades of mistakes in both to build successful and celebrated security and observability programs without them.

Avatar for Brad Lhotsky

Brad Lhotsky

March 29, 2026
Tweet

More Decks by Brad Lhotsky

Other Decks in Technology

Transcript

  1. Context • NIA Intramural Research Program • Tiny budget, high

    impact • Scientific Method and Statistics • Booking.com • 3.2 Billion EUR revenue, 35% OpEx, 40% YoY Growth • Data-driven approach to everything • craigslist • 1.2 Billion USD Revenue, sub 10% OpEx • Focus on efficiency and ownership • Unashamed Perl Programmer My Background
  2. Werner Heisenberg "What we observe is not nature itself, but

    nature exposed to our method of questioning."
  3. Understand Humans to Improve Technology • Everything is a communication

    problem • Talk to people • Conflict is a resource • Use conflict to build value • Understand People • "Predictably Irrational" • The Betty Crocker Experiment • Tokens vs Money • Pepsi and Bad Customer Experiences It's always DNS people
  4. Human Error Doesn't Exist* • Human error is the start

    of the investigation, not the end • Think in systems • Evaluate actions and actors in the context of the system • Look at incentives and penalties • Toyota's "Five Why's" * It's complicated
  5. Don Norman - The Design of Everyday Things "Humans err

    continually; it is an intrinsic part of our nature. System design should take this into account."
  6. Policy & Enforcement Don't Change Behavior • What affects driver

    speed more? • Speed limits? • Paint? • Use behavior to write your policy • Change the system, not the policy • Update the policy once behavior changes
  7. 90% of My Ideas are not Good • Most of

    our ideas/changes don't really move the needle in terms of business impact • It's important to be able to experiment and find the ones that do • Blameless Culture • Fast Deployments and faster reverts • Observability tied to key business metrics
  8. Pareto was at least 80% Right • The 80/20 Rule

    • 80% of revenue from 20% of the customers • 80% of the features from the first 20% of the work • Don't launch feature complete, get 80% there and then test. You can pivot if your feature wasn't a good idea.
  9. More Data Isn't Enough • Vendors will sell you on

    sending and collecting more data • But wait, don't they bill based on data volume? • The data you have is more valuable than the data you might have one day • Zoom in on specific problems, and then compare with larger populations A bird in hand ...
  10. FDAP • Arrow Flight - Exchange Protocol • DataFusion -

    Query Engine • Arrow - In-Flight Format • Parquet - At Rest Format
  11. Data Without Analytical Skills is Dangerous • Take a statistics

    class every year • Observability datasets are not Normal • Apply the Scientific Method • Assume you're wrong (I usually am!) • Test, re-test, and test again • Peer Review • Familiarize yourself with logical fallacies "Lies, damn lies, and statistics"
  12. Context Matters • Which warrants engineering resources? A. p95 load

    time for checkout is 2s B. p95 load time for search is 500ms C. Both D. Neither E. Not enough information Observability Tenants can lead you astray
  13. Data doesn't lie, but it might not tell the truth

    either correlation <> causation
  14. Precision Doesn't Matter* • Observability vs Forensic Audit Log •

    Fast, discoverable data is better than exact data • Rare events are rare • Don't obsess over deduplication or missing data You just need to be more "correct" than your competitors * Except when it does?
  15. Build or Buy? Build ✅ Exactly what you want ✅

    Workflows inform design ✅ Creates expertise ❌ Takes time Buy ✅ Faster ❌ Design informs workflows ❌ Almost what you want, until they upgrade to Tahoe
  16. Maybe Not AI? • Major Ethical Problems • Major Environmental

    Problems • Major Societal Problems • Loss of Knowledge / Skill • Mathematically Unsound? • Goedel's Incompleteness Theorems • Conservation of Information • Prompt Injections still not solved • CamEL is interesting
  17. Notable Security Software • Snort / Suricata - IDS •

    OSSEC-HIDS - Host-based IDS / EDR • osquery - Makes computers queryable • Zeek - Network Data Capture • Wazuh - Successor to OSSEC
  18. Notable Observability Software • Graphite - Started "Observability" • statsd

    - Made APM accessible • ElasticSearch - Made logs/events accessible • Grafana - Made observability data comparable • ClickHouse - Made observability data cheap and fast
  19. Inspiration • Design of Everyday Things - Don Norman •

    Demon Haunted World - Carl Sagan • High Conflict - Amanda Ripley • Four Thousand Weeks - Oliver Burkeman • Invisible Influence - Jonah Berger • Deep Work - Cal Newport • Visual Display of Quantitative Information - Edward Tufte • Information Dashboard Design - Stephen Few • Fluke - Brin Klaas • Predictably Irrational - Dan Ariely*