Starting, growing, and scaling your host intrusion detection efforts

Starting, growing, and scaling your host intrusion detection efforts Mike
Arpaia Co-Founder & CTO @ Kolide https://twitter.com/mikearpaia

Today’s Agenda • Walk through the lifecycle of introducing and
scaling a host intrusion detection function at an organization • Explore the minimal set of osquery features that are needed throughout the lifecycle of rolling out a deployment • Focus on accomplishing appropriate organizational objectives unobtrusively • Keep an eye on common pitfalls and gotchas all along the way • Focus entirely on the open-source ecosystem

My HIDS Background • Software Engineer @ Etsy • Built,
deployed, and open-sourced a custom Mac HIDS • Software Engineer @ Facebook • Designed, wrote, deployed, and open-sourced osquery • Engineering Manager @ Facebook • Supported the intrusion detection infrastructure team • Co-Founder & CTO @ Kolide • Building scalable software and large cloud infrastructure for SaaS intrusion detection

Open Source Disclaimer • At Kolide, we open source a
lot of the tools that we make for managing osquery deployments • While we use osquery to deliver the insights in our cloud product, managing osquery is not what we’re trying to sell • Managing your host instrumentation infrastructure should be a democratized commodity capability • We’re not the only show in town with open-source tools for this stuff • Fleet Alternatives: SGT (Okta), Windmill (Heroku), Doorman (Marcin) • Launcher Alternatives: DIY!

Initial Assumptions • This talk is all about introducing host
intrusion detection at a company with roughly the following properties, values, and capabilities • Mostly macOS desktops, with some Windows and Linux • All Linux in production, (some bare metal, some cloud VMs, multiple providers) • Some Windows servers in a DMZ (domain controller, exchange server) • Strong desire to self-host open-source software • No budget, no vendors, no closed-sourced tools • Established ability to deploy, conﬁgure, and update software in all environments • Some existing logging/alerting pipeline that has the ability to parse JSON

Roll-Out Strategy • Start with your desktop environment • Higher
likelihood of being where an attack starts • Desktop users will tell you if something is going wrong • Increase organizational risk with operational experience • Start with your team first, then teams you work with • Ideally try to find teams you sit close to at first

Step-by-Step Deployment • Step 1: Initial R&D • Step 2:
Remote Conﬁguration • Step 3: Desktop Deployment • Step 4: Initial Production Deployment • Step 5: Complete Production Deployment • Step 6: Mergers and Acquisitions

Step 1: Initial R&D Test a deployment on your local
machine

Osquery Introduction • Write SQL to articulate facts about operating
system state • Use the queries to monitor the system in conﬁgurable ways • How do the results of this query change over time? • What is the result of this query on all of my hosts? • Tell me the results of this query every 24 hours • Poll and event-based model for emitting events • Efﬁcient in both production and corporate environments

Configuration and Logs • Config In, Logs Out • Osquery
uses a declarative JSON configuration • You specify what queries you want to get logs for • Osquery logs the results of the queries as requested • Both configuration and logging are customizable

Use the Default Plugins • The way osquery acquires configuration
and publishes logs are each configurable by “plugins” (filesystem, tls, etc) • HID is hard and can become a complex data problem as deployments grow • Let’s use the filesystem to configure osquery and view logs • Focus on finding data that will be useful in your pipeline

Example Conﬁg { "packs": { “desktop_monitoring”: { "users": { "query":
"select * from users”, "interval": 60 } } } }

{ "name": “pack:desktop_monitoring:users", "hostIdentifier": "FA01680E-98CA-5557-8F59-7716ECFEE964", "calendarTime": "Sun Apr 1 20:04:45
2018 UTC", "unixTime": 1522613085, "decorations": { “osquery_version": “2.11.2” }, "columns": { "description": "victor", "directory": "/Users/victor", "gid": "20", "gid_signed": "20", "shell": "/bin/zsh", "total_seconds": "371714", "uid": "501", "uid_signed": "501", "username": "victor", "uuid": "00A4617C-AC3A-4CEC-95F4-3143161820DA" }, "action": "added" } Example Logs

Getting value from the logs • We’ve used the tools
to articulate a conﬁguration that produces results that are useful in our environment • Local tools are great for screencasts but not for critical data pipelines • Let’s explore how to distribute conﬁguration, aggregate logs uniformly, and ingest logs into your pipeline

Step 2: Remote Conﬁguration Conﬁgure, track, and query multiple endpoints

Conﬁg Distribution Options • Filesystem • Pro: simple, reliable •
Con: Requires mature conﬁguration management tools • Con: Live Query and Logging still TBD • TLS / gRPC • Pro: Many open-source options • Pro: Live Query and centralized logging usually included • Con: Server component required

Log Acquisition Options • Filesystem • Pro: Simple, reliable •
Con: Requires existing log forwarding tools on the endpoint • TLS / gRPC • Pro: Single, well-understood way to receive logs and ingest them into your pipeline • Con: Server component required • AWS • Pro: Robust logging pipeline integrations for users that are committed to AWS • Con: Requires AWS

Live Query Options • TLS / gRPC • Really the
only game in town • You can fake this with certain scheduled query options

Advice: Consider TLS / gRPC • Many open source options
• Easy to create custom server if necessary • Decouple distribution from conﬁguration and logging • Operational overhead is low • Fancy live query UI, say what?

Setting Up Kolide Fleet

Your Osquery Conﬁguration

Step 3: Desktop Deployment Proliferate your packages in the desktop
environment

Enrolling Hosts • Make a package which, upon installation, will
enroll a host with your osquery server • Some folks use two packages: one for osquery binaries, one for initial conﬁguration • See https://osquery.io/downloads for binary packages • Others distribute a single package which combines the osquery binaries with initial conﬁguration • See Kolide Launcher’s package-builder tool

Using Package Builder

Distributing Osquery • Open source options for various operating systems
• macOS: Chef, Munki, MicroMDM • Windows: Chef, SCCM • Linux: Chef, Puppet, Ansible • Kolide Launcher includes autoupdate capability • Update machinery uses TUF and Docker Notary

Write Queries and Alerts • The desktop environment contains a
lot of low-hanging fruit • Use existing queries to ﬁnd malicious activity • Write your own and share them! • Gain insight into misconﬁgured system states • Establish a working, productive pipeline with internal tools

Step 4: Initial Production Deployment With conﬁdence from the Desktop
environment, start deploying osquery to high-value production servers

Important Osquery Options • Optimize the osquery query schedule •
osqueryd --help | grep schedule • Configure the event-based monitoring • osqueryd --help | grep events • Configure the utilization watchdog • osqueryd --help | grep watchdog • Configure osquery extensions • osqueryd --help | grep extensions

Packaging Osquery in Prod • Likely will need to use
an internal packaging environment • Osquery publishes signed Linux packages and binaries • Statically distribute osquery and/or launcher in a container • Do what makes sense in your production environment

Behavioral Conﬁguration • Use labels to gate the distribution of
packs • Hosts only join labels when they start exhibiting certain behaviors • Only monitor for certain remote exploitation techniques if hosts are binding to TCP port • Only monitor for MySQL misconﬁguration if MySQL is running on the host

Production Logging Pipeline • Most organizations have an internal, high-performance
production logging pipeline • Kolide uses GCP Pub/Sub, Facebook has Scribe, you may have Kafka, etc • A variety of plugins already exist for tools like this • Plugins can be written in C++, Go, Python, JavaScript • TLS plugin can still be used efﬁciently

Step 5: Complete Production Deployment Progressively roll-out osquery across all
of production

Config Management • Add osquery to the base recipe/image for
your environment • Ensure that config management is uniform and use it uniformly • Work with Ops and IT to ensure that assets are tracked and configured reliably • Understand what should be there and verify it

Safety Features • Osquery includes a number of safety features
to limit it’s ability to harm the workloads on an environment • Worker/Watcher multi-processes model that ensure resource utilization stays within acceptable bounds • New queries introduce new executable code across your environment • Introduce queries to “shards” 0-100 of your environment • Shard 10 contains 10% of hosts, etc • Monitor query resource utilization via internal metrics tables

Step 6: Mergers and Acquisitions Be prepared to monitor new,
unfamiliar environments

Have Packages Ready • M&A’s will introduce new prod and
corp environments • Be ﬂexible and invest in packaging tooling • Package redistribution may not be easy after deployment • Consider using an autoupdate tool like Kolide Launcher

TLS on the Edge • Don’t be afraid to put
your osquery server on the internet • Securing and scaling HTTP is a well-understood objective • Two options available for authentication: PSK & Client Cert • Use production load balancers if available

Categorizable Environments • Make heavy use of labels to categorize
environments • Consider adding automation to have different groups of hosts connect to different endpoints • Be conscious of internal data privacy laws if applicable

Conclusion & Take-Aways • Scalable intrusion detection is a commodity
capability • Integrate with your internal tools • Don’t use the features that you don’t need • But don’t be afraid to use the features you do need • Start on the Desktop to balance risk and reward • But don’t stop there

Thank You Questions? •https://osquery.io •https://kolide.com/ﬂeet •https://kolide.com/launcher •https://github.com/kolide •https://blog.kolide.com

Starting, growing, and scaling your host intrus...

Starting, growing, and scaling your host intrusion detection efforts

More Decks by Mike Arpaia

Other Decks in Technology

Featured

Transcript