Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling osquery with pure OpenSource technologies

Scaling osquery with pure OpenSource technologies

Understanding well how the osquery remote API works is the first step to build scalable software around it. We will discuss the approach to go from just a few nodes to dozens of thousands while keeping everything working smoothly. None of the secret sauce is actually secret, since it’s still open source!

Original presentation: https://docs.google.com/presentation/d/1itQmFSvrAT4wTcLxRb43SAdQcWLF7gq5bSihrmNAfm0/edit?usp=sharing

Javier Marcos

January 22, 2020
Tweet

More Decks by Javier Marcos

Other Decks in Technology

Transcript

  1. ➔ Former: ➔ Current: Javier Marcos de Prado Staff Security

    Engineer @ ABS Global Trading Page 2 @osqueryatscale @javutin $ whoami @javutin javuto
  2. Page 3 @osqueryatscale @javutin Agenda ▪ Quick overview of osquery

    ▪ The osquery remote API ▪ Building and scaling your TLS endpoint ▪ osctrl as example of a TLS endpoint ▪ Conclusions and lessons learned
  3. Page 5 @osqueryatscale @javutin The CLI - osqueryi (single host)

    https://osquery.readthedocs.io/en/stable/introduction/using-osqueryi/
  4. Page 6 @osqueryatscale @javutin The daemon - osqueryd (single host)

    https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ intrusion detection use cases centralized management (backend) operative system, users, services configuration logging osqueryd
  5. Page 7 @osqueryatscale @javutin The daemon - osqueryd (multiple hosts)

    https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ ...
  6. Page 8 @osqueryatscale @javutin osqueryd logging https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ ➔ Local logging

    ❏ Forwarders ◦ Logstash, Splunk... ➔ Remote logging ❏ Kinesis ❏ Kafka ❏ Splunk ❏ TLS endpoint
  7. Page 9 @osqueryatscale @javutin osqueryd configuration https://osquery.readthedocs.io/en/stable/introduction/using-osqueryd/ ➔ Local configuration

    ❏ IT/infra management ◦ Chef, puppet, jamf, ansible... ➔ Remote configuration ❏ TLS endpoint
  8. Page 12 @osqueryatscale @javutin osquery remote API ▪ Enroll POST

    /path/to/enroll ▪ Configuration POST /path/to/config ▪ Logs POST /path/to/log ▪ Extras (On-demand queries) (File carving) ... https://osquery.readthedocs.io/en/stable/deployment/remote/
  9. Page 13 @osqueryatscale @javutin osquery remote API: Enroll https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "enroll_secret": "...", // Optional. "host_identifier": "...", // --host_identifier flag "host_details": { // Helpful osquery tables. "os_version": {}, "osquery_info": {}, "system_info": {}, "platform_info": {} } } POST /path/to/enroll
  10. Page 14 @osqueryatscale @javutin osquery remote API: Enroll https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "node_key": "...", // Optionally blank "node_invalid": false // Optional, true to indicate failure. } HTTP RESPONSE
  11. Page 16 @osqueryatscale @javutin osquery remote API: Configuration https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "schedule": { "query_name": { "query": "...", "interval": 10 } }, "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE
  12. Page 17 @osqueryatscale @javutin osquery remote API: Logs https://osquery.readthedocs.io/en/stable/deployment/remote/ {

    "node_key": "...", // Optionally blank "log_type": "result", // Either "result" or "status" "data": [ {...} // Each result event, or status event ] } POST /path/to/log
  13. Page 19 @osqueryatscale @javutin osquery remote API: On-demand queries (read)

    https://osquery.readthedocs.io/en/stable/deployment/remote/ { "node_key": "..." // Optionally blank } POST /path/to/query-read
  14. Page 20 @osqueryatscale @javutin osquery remote API: On-demand queries (read)

    https://osquery.readthedocs.io/en/stable/deployment/remote/ { "queries": { "id1": "SELECT * FROM osquery_info;", "id2": "SELECT * FROM osquery_schedule;", "id3": "SELECT * FROM does_not_exist;" }, "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE
  15. Page 21 @osqueryatscale @javutin osquery remote API: On-demand queries (write)

    https://osquery.readthedocs.io/en/stable/deployment/remote/ { "node_key": "...", "queries": { "id1": [ {"column1": "value1", "column2": "value2"} ] }, "statuses": { "id1": 0 } } POST /path/to/query-write
  16. Page 22 @osqueryatscale @javutin osquery remote API: On-demand queries (write)

    https://osquery.readthedocs.io/en/stable/deployment/remote/ { "node_invalid": false // Optional, true for re-enrollment. } HTTP RESPONSE
  17. Page 23 @osqueryatscale @javutin osquery remote API: Flags https://osquery.readthedocs.io/en/stable/deployment/remote/ ▪

    Enroll --enroll_tls_endpoint ▪ Configuration --config_tls_endpoint --config_tls_refresh ▪ Log --logger_tls_endpoint --logger_tls_period ▪ Queries --distributed_tls_[read-write]_endpoint --distributed_interval
  18. Page 25 @osqueryatscale @javutin Building a TLS endpoint ▪ Handler

    for Enroll ▪ Handler for Configuration ▪ Handlers for Logs ▪ Handlers for extras (On-demand queries) (File carving) ... https://osquery.readthedocs.io/en/stable/deployment/remote/ ✅ ✅ ✅ ✅
  19. Page 26 @osqueryatscale @javutin Scaling a TLS endpoint - Configuration

    TLS endpoint - Logs - On-demand queries - (Enroll)
  20. Page 27 @osqueryatscale @javutin Scaling a TLS endpoint 1 x

    --config_tls_refresh=60 --logger_tls_period=60 --distributed_interval=60 = 3 requests per minute
  21. Page 28 @osqueryatscale @javutin Scaling a TLS endpoint N x

    --config_tls_refresh=300 --logger_tls_period=600 --distributed_interval=100 =
  22. Page 29 @osqueryatscale @javutin Scaling a TLS endpoint LOG CONFIG

    QUERY intervals 600 300 0 100 100 100 100 CONFIG QUERY QUERY QUERY QUERY QUERY
  23. Page 30 @osqueryatscale @javutin Scaling a TLS endpoint sum(highest_interval /

    each_interval) 600 / 600 = 1 600 / 300 = 2 600 / 100 = 6 9 requests per 600 seconds 9 / 600 = 0.015 per second ; For 1000 nodes, ~15 per second
  24. Page 31 @osqueryatscale @javutin Scaling a TLS endpoint: Caveats ▪

    Don’t forget enroll! (N requests at T0) ▪ Query writes? ▪ File carving? ▪ Accelerated mode? ▪ All those intervals are NOT splayed peaks
  25. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do

    eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Page 34 @osqueryatscale @javutin osctrl-admin / osctrl-api Status / Results / Logs Configuration osctrl-tls operator backend metrics osctrl-cli osquery vpn https://osctrl.net
  26. Page 35 @osqueryatscale @javutin ➔ Monitor osquery agents ✅ ➔

    Collect and process status/result logs ✅ ➔ Distribute osquery configuration fast ✅ ➔ Run on-demand queries ✅ ➔ Extract files/directories using file carves ✅ https://osctrl.net
  27. Page 38 @osqueryatscale @javutin Conclusions / lessons learned ▪ Buy

    solution VS. Build solution ▪ Always plan for the worse case scenario ▪ If you are using cloud, make the most of it ▪ Once you have logs, don’t forget them! ▪ Read code for hidden/undocumented features