Starting, growing, and scaling your host intrusion detection efforts

Starting, growing, and scaling your host intrusion detection efforts

Osquery is a lightweight host intrusion detection tool that organizations can use to monitor extremely large production environments as well as smaller corporate environments. In this talk, we will discuss how to get started with osquery and how the way that you manage osquery may change as your organization and objectives evolve. Starting small with an initial PoC, it's important to exhibit a full detection pipeline as quickly and simply as possible. Over time, as you instrument more environments at your organization, the tools that are available for device configuration and communication will likely change. With many environments to monitor, we will be able to take advantage of more osquery features that allow us to succinctly and dynamically reason about attack surface based on system state. As we talk through this evolution, we will discuss proven strategies and common pitfalls.


Mike Arpaia

April 05, 2018


  1. Starting, growing, and scaling your host intrusion detection efforts Mike

    Arpaia Co-Founder & CTO @ Kolide
  2. Today’s Agenda • Walk through the lifecycle of introducing and

    scaling a host intrusion detection function at an organization • Explore the minimal set of osquery features that are needed throughout the lifecycle of rolling out a deployment • Focus on accomplishing appropriate organizational objectives unobtrusively • Keep an eye on common pitfalls and gotchas all along the way • Focus entirely on the open-source ecosystem
  3. My HIDS Background • Software Engineer @ Etsy • Built,

    deployed, and open-sourced a custom Mac HIDS • Software Engineer @ Facebook • Designed, wrote, deployed, and open-sourced osquery • Engineering Manager @ Facebook • Supported the intrusion detection infrastructure team • Co-Founder & CTO @ Kolide • Building scalable software and large cloud infrastructure for SaaS intrusion detection
  4. Open Source Disclaimer • At Kolide, we open source a

    lot of the tools that we make for managing osquery deployments • While we use osquery to deliver the insights in our cloud product, managing osquery is not what we’re trying to sell • Managing your host instrumentation infrastructure should be a democratized commodity capability • We’re not the only show in town with open-source tools for this stuff • Fleet Alternatives: SGT (Okta), Windmill (Heroku), Doorman (Marcin) • Launcher Alternatives: DIY!
  5. Initial Assumptions • This talk is all about introducing host

    intrusion detection at a company with roughly the following properties, values, and capabilities • Mostly macOS desktops, with some Windows and Linux • All Linux in production, (some bare metal, some cloud VMs, multiple providers) • Some Windows servers in a DMZ (domain controller, exchange server) • Strong desire to self-host open-source software • No budget, no vendors, no closed-sourced tools • Established ability to deploy, configure, and update software in all environments • Some existing logging/alerting pipeline that has the ability to parse JSON
  6. Roll-Out Strategy • Start with your desktop environment • Higher

    likelihood of being where an attack starts • Desktop users will tell you if something is going wrong • Increase organizational risk with operational experience • Start with your team first, then teams you work with • Ideally try to find teams you sit close to at first
  7. Step-by-Step Deployment • Step 1: Initial R&D • Step 2:

    Remote Configuration • Step 3: Desktop Deployment • Step 4: Initial Production Deployment • Step 5: Complete Production Deployment • Step 6: Mergers and Acquisitions
  8. Step 1: Initial R&D Test a deployment on your local

  9. Osquery Introduction • Write SQL to articulate facts about operating

    system state • Use the queries to monitor the system in configurable ways • How do the results of this query change over time? • What is the result of this query on all of my hosts? • Tell me the results of this query every 24 hours • Poll and event-based model for emitting events • Efficient in both production and corporate environments
  10. Configuration and Logs • Config In, Logs Out • Osquery

    uses a declarative JSON configuration • You specify what queries you want to get logs for • Osquery logs the results of the queries as requested • Both configuration and logging are customizable
  11. Use the Default Plugins • The way osquery acquires configuration

    and publishes logs are each configurable by “plugins” (filesystem, tls, etc) • HID is hard and can become a complex data problem as deployments grow • Let’s use the filesystem to configure osquery and view logs • Focus on finding data that will be useful in your pipeline
  12. Example Config { "packs": { “desktop_monitoring”: { "users": { "query":

    "select * from users”, "interval": 60 } } } }
  13. { "name": “pack:desktop_monitoring:users", "hostIdentifier": "FA01680E-98CA-5557-8F59-7716ECFEE964", "calendarTime": "Sun Apr 1 20:04:45

    2018 UTC", "unixTime": 1522613085, "decorations": { “osquery_version": “2.11.2” }, "columns": { "description": "victor", "directory": "/Users/victor", "gid": "20", "gid_signed": "20", "shell": "/bin/zsh", "total_seconds": "371714", "uid": "501", "uid_signed": "501", "username": "victor", "uuid": "00A4617C-AC3A-4CEC-95F4-3143161820DA" }, "action": "added" } Example Logs
  14. Getting value from the logs • We’ve used the tools

    to articulate a configuration that produces results that are useful in our environment • Local tools are great for screencasts but not for critical data pipelines • Let’s explore how to distribute configuration, aggregate logs uniformly, and ingest logs into your pipeline
  15. Step 2: Remote Configuration Configure, track, and query multiple endpoints

  16. Config Distribution Options • Filesystem • Pro: simple, reliable •

    Con: Requires mature configuration management tools • Con: Live Query and Logging still TBD • TLS / gRPC • Pro: Many open-source options • Pro: Live Query and centralized logging usually included • Con: Server component required
  17. Log Acquisition Options • Filesystem • Pro: Simple, reliable •

    Con: Requires existing log forwarding tools on the endpoint • TLS / gRPC • Pro: Single, well-understood way to receive logs and ingest them into your pipeline • Con: Server component required • AWS • Pro: Robust logging pipeline integrations for users that are committed to AWS • Con: Requires AWS
  18. Live Query Options • TLS / gRPC • Really the

    only game in town • You can fake this with certain scheduled query options
  19. Advice: Consider TLS / gRPC • Many open source options

    • Easy to create custom server if necessary • Decouple distribution from configuration and logging • Operational overhead is low • Fancy live query UI, say what?
  20. Setting Up Kolide Fleet

  21. Your Osquery Configuration

  22. Step 3: Desktop Deployment Proliferate your packages in the desktop

  23. Enrolling Hosts • Make a package which, upon installation, will

    enroll a host with your osquery server • Some folks use two packages: one for osquery binaries, one for initial configuration • See for binary packages • Others distribute a single package which combines the osquery binaries with initial configuration • See Kolide Launcher’s package-builder tool
  24. Using Package Builder

  25. Distributing Osquery • Open source options for various operating systems

    • macOS: Chef, Munki, MicroMDM • Windows: Chef, SCCM • Linux: Chef, Puppet, Ansible • Kolide Launcher includes autoupdate capability • Update machinery uses TUF and Docker Notary
  26. Write Queries and Alerts • The desktop environment contains a

    lot of low-hanging fruit • Use existing queries to find malicious activity • Write your own and share them! • Gain insight into misconfigured system states • Establish a working, productive pipeline with internal tools
  27. Step 4: Initial Production Deployment With confidence from the Desktop

    environment, start deploying osquery to high-value production servers
  28. Important Osquery Options • Optimize the osquery query schedule •

    osqueryd --help | grep schedule • Configure the event-based monitoring • osqueryd --help | grep events • Configure the utilization watchdog • osqueryd --help | grep watchdog • Configure osquery extensions • osqueryd --help | grep extensions
  29. Packaging Osquery in Prod • Likely will need to use

    an internal packaging environment • Osquery publishes signed Linux packages and binaries • Statically distribute osquery and/or launcher in a container • Do what makes sense in your production environment
  30. Behavioral Configuration • Use labels to gate the distribution of

    packs • Hosts only join labels when they start exhibiting certain behaviors • Only monitor for certain remote exploitation techniques if hosts are binding to TCP port • Only monitor for MySQL misconfiguration if MySQL is running on the host
  31. Production Logging Pipeline • Most organizations have an internal, high-performance

    production logging pipeline • Kolide uses GCP Pub/Sub, Facebook has Scribe, you may have Kafka, etc • A variety of plugins already exist for tools like this • Plugins can be written in C++, Go, Python, JavaScript • TLS plugin can still be used efficiently
  32. Step 5: Complete Production Deployment Progressively roll-out osquery across all

    of production
  33. Config Management • Add osquery to the base recipe/image for

    your environment • Ensure that config management is uniform and use it uniformly • Work with Ops and IT to ensure that assets are tracked and configured reliably • Understand what should be there and verify it
  34. Safety Features • Osquery includes a number of safety features

    to limit it’s ability to harm the workloads on an environment • Worker/Watcher multi-processes model that ensure resource utilization stays within acceptable bounds • New queries introduce new executable code across your environment • Introduce queries to “shards” 0-100 of your environment • Shard 10 contains 10% of hosts, etc • Monitor query resource utilization via internal metrics tables
  35. Step 6: Mergers and Acquisitions Be prepared to monitor new,

    unfamiliar environments
  36. Have Packages Ready • M&A’s will introduce new prod and

    corp environments • Be flexible and invest in packaging tooling • Package redistribution may not be easy after deployment • Consider using an autoupdate tool like Kolide Launcher
  37. TLS on the Edge • Don’t be afraid to put

    your osquery server on the internet • Securing and scaling HTTP is a well-understood objective • Two options available for authentication: PSK & Client Cert • Use production load balancers if available
  38. Categorizable Environments • Make heavy use of labels to categorize

    environments • Consider adding automation to have different groups of hosts connect to different endpoints • Be conscious of internal data privacy laws if applicable
  39. Conclusion & Take-Aways • Scalable intrusion detection is a commodity

    capability • Integrate with your internal tools • Don’t use the features that you don’t need • But don’t be afraid to use the features you do need • Start on the Desktop to balance risk and reward • But don’t stop there
  40. Thank You Questions? • • • • •