Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling osquery with osctrl

Scaling osquery with osctrl

Understanding well how the osquery remote API works is the first step to build scalable software around it. We will discuss the approach to go from just a few nodes to dozens of thousands while keeping everything working smoothly.

None of the secret sauce is actually secret, since it’s still open source!

Osquery (https://osquery.io) is an open source tool, originally developed by the Facebook Security team and currently managed by its own community as part of the Linux Foundation. It allows you to run a SQL engine on top of your operative system, and to use SQL queries to extract information about the health state and changes of the systems in your networks. This tool enhances the incident response capabilities of a Security team and it is important to understand the differences between the osquery shell (osqueryi) and the osquery daemon (osqueryd) and how to use them in a large enterprise network. In particular the remote API of osquery.

Understanding well how the osquery remote API works is the first step to build scalable software around it. We will discuss the approach to go from just a few nodes to dozens of thousands while keeping everything working smoothly. None of the secret sauce is actually secret, since it’s still open source!

A solution to enhance the detection capabilities of osquery is osctrl (https://osctrl.net), a fast and efficient osquery management solution, that implements its remote API as TLS endpoint, and allows to monitor, configure and interact with all your production or corporate assets, that use osquery as host instrumentation solution.

Osctrl has been used in corporate and production environments with thousands of nodes, thanks to its ability to scale and provide a reliable solution. Its particular architecture is key to scale, whether the environment is cloud, virtualized, container based or even bare metal.

Avatar for Javier Marcos

Javier Marcos

December 13, 2020
Tweet

More Decks by Javier Marcos

Other Decks in Technology

Transcript

  1. ➔ Former: ➔ Current: Javier Marcos de Prado Staff Security

    Engineer @ Page 2 @javutin $ whoami @javutin javuto
  2. Page 3 Agenda ▪ Quick overview of osquery ▪ The

    osquery remote API ▪ Building and scaling your TLS endpoint ▪ osctrl as example of a TLS endpoint ▪ Conclusions and lessons learned @javutin
  3. Page 6 The daemon - osqueryd (single host) https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ intrusion

    detection use cases centralized management (backend) operative system, users, services configuration logging osqueryd @javutin
  4. Page 8 osqueryd logging https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ ➔ Local logging ❏ Forwarders

    ◦ Logstash, Splunk... ➔ Remote logging ❏ Kinesis ❏ Kafka ❏ Splunk ❏ TLS endpoint @javutin
  5. Page 9 osqueryd configuration https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ ➔ Local configuration ❏ IT/infra

    management ◦ Chef, puppet, jamf, ansible... ➔ Remote configuration ❏ TLS endpoint @javutin
  6. Page 12 osquery remote API ▪ Enroll POST /path/to/enroll ▪

    Configuration POST /path/to/config ▪ Logs POST /path/to/log ▪ Extras (On-demand queries) (File carving) ... https://osquery.readthedocs.io/en/stable/deployment/remote/ @javutin
  7. Page 13 osquery remote API: Enroll https://osquery.readthedocs.io/en/stable/deployment/remote/ { "enroll_secret": "...",

    // Optional. "host_identifier": "...", // --host_identifier flag "host_details": { // Helpful osquery tables. "os_version": {}, "osquery_info": {}, "system_info": {}, "platform_info": {} } } POST /path/to/enroll @javutin
  8. Page 14 osquery remote API: Enroll https://osquery.readthedocs.io/en/stable/deployment/remote/ { "node_key": "...",

    // Optionally blank "node_invalid": false // Optional, true to indicate failure. } HTTP RESPONSE @javutin
  9. Page 16 osquery remote API: Configuration https://osquery.readthedocs.io/en/stable/deployment/remote/ { "schedule": {

    "query_name": { "query": "...", "interval": 10 } }, "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE @javutin
  10. Page 17 osquery remote API: Logs https://osquery.readthedocs.io/en/stable/deployment/remote/ { "node_key": "...",

    // Optionally blank "log_type": "result", // Either "result" or "status" "data": [ {...} // Each result event, or status event ] } POST /path/to/log @javutin
  11. Page 19 osquery remote API: On-demand queries (read) https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "node_key": "..." // Optionally blank } POST /path/to/query-read @javutin
  12. Page 20 osquery remote API: On-demand queries (read) https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "queries": { "id1": "SELECT * FROM osquery_info;", "id2": "SELECT * FROM osquery_schedule;", "id3": "SELECT * FROM does_not_exist;" }, "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE @javutin
  13. Page 21 osquery remote API: On-demand queries (write) https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "node_key": "...", "queries": { "id1": [ {"column1": "value1", "column2": "value2"} ] }, "statuses": { "id1": 0 } } POST /path/to/query-write @javutin
  14. Page 22 osquery remote API: On-demand queries (write) https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE @javutin
  15. Page 23 osquery remote API: Flags https://osquery.readthedocs.io/en/stable/deployment/remote/ ▪ Enroll --enroll_tls_endpoint

    ▪ Configuration --config_tls_endpoint --config_tls_refresh ▪ Log --logger_tls_endpoint --logger_tls_period ▪ Queries --distributed_tls_[read-write]_endpoint --distributed_interval @javutin
  16. Page 25 Building a TLS endpoint ▪ Handler for Enroll

    ▪ Handler for Configuration ▪ Handlers for Logs ▪ Handlers for extras (On-demand queries) (File carving) ... https://osquery.readthedocs.io/en/stable/deployment/remote/ ✅ ✅ ✅ ✅ @javutin
  17. Page 26 Scaling a TLS endpoint - Configuration TLS endpoint

    - Logs - On-demand queries - (Enroll) @javutin
  18. Page 27 Scaling a TLS endpoint 1 x --config_tls_refresh=60 --logger_tls_period=60

    --distributed_interval=60 = 3 requests per minute @javutin
  19. Page 29 Scaling a TLS endpoint LOG CONFIG QUERY intervals

    600 300 0 100 200 400 500 CONFIG QUERY QUERY QUERY QUERY QUERY @javutin
  20. Page 30 Scaling a TLS endpoint sum(highest_interval / each_interval) 600

    / 600 = 1 600 / 300 = 2 600 / 100 = 6 9 requests per 600 seconds 9 / 600 = 0.015 per second ; For 1000 nodes, ~15 per second @javutin
  21. Page 31 Scaling a TLS endpoint: Caveats ▪ Don’t forget

    enroll! (N requests at T0) ▪ Query writes? ▪ File carving? ▪ Accelerated mode? ▪ All those intervals are NOT splayed peaks @javutin
  22. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do

    eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Page 34 osctrl-admin / osctrl-api Status / Results / Logs Configuration osctrl-tls operator backend metrics osctrl-cli osquery vpn https://osctrl.net @javutin
  23. Page 35 ➔ Monitor osquery agents ✅ ➔ Collect and

    process status/result logs ✅ ➔ Distribute osquery configuration fast ✅ ➔ Run on-demand queries ✅ ➔ Extract files/directories using file carves ✅ https://osctrl.net @javutin
  24. Page 38 Future work in ▪ More functionality in osctrl-api

    ▪ More functionality in osctrl-cli ▪ Query packs, dashboard, graphics! ▪ Functionality based on tags ▪ Better backend usage to avoid CPU spikes @javutin https://github.com/jmpsec/osctrl/issues
  25. Page 40 Conclusions / lessons learned ▪ Buy solution VS.

    Build solution ▪ Always plan for the worst case scenario ▪ If you are using cloud, make the most of it ▪ Once you have logs, don’t forget them! ▪ Read code for hidden/undocumented features @javutin