Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Operating Falco at Scale in Cloud-Native Enviro...

Operating Falco at Scale in Cloud-Native Environments

How can the real-time threat detection engine “Falco” be effectively operated and used to ensure security in a large-scale cloud-native environment? This presentation focuses on specific operational strategies within Mercari’s environment, where a diverse range of microservices and GKE clusters coexist. It will cover a wide range of practical know-how to address challenges unique to large-scale environments, such as mechanisms for automated cluster registration and rule deployment, structured rule management to prevent false positives, and methods for adding context to alerts to streamline incident response.

Chihiro Hasegawa and Maximilian Frank presented this at the Cloud Native Community Japan: CNAPP Security Meetup 2025-12-16.

- https://cncj-security.connpass.com/event/367233/

For the original Japanese slides see
- https://speakerdeck.com/owlinux1000/da-gui-mo-cloud-nativehuan-jing-niokerufalconoyun-yong

Avatar for Maximilian Frank

Maximilian Frank

December 16, 2025
Tweet

Other Decks in Technology

Transcript

  1. 1 Operating Falco at Scale in Cloud-Native Environments 2025/12/16 Cloud

    Native Security Japan Meetup Chihiro Hasegawa / Maximilian Frank
  2. 2 Threat Detection and Response team member, engaged in alert

    handling, incident response, and SOAR development and operations Maximilian Frank Threat Detection and Response team member, in charge of detection, incident response and system development Chihiro Hasegawa
  3. 3 What is Falco Table of Contents Strategy of Falco

    Deployment Tricks to effectively use Falco Wrap up 02 03 04 01
  4. 5 What is Falco “Real-time Threat Detection Engine for Container

    and Cloud Native” https://falco.org/about/
  5. 11 How to manage Falco rules • Management of Index

    ◦ Configure rules will be downloaded ▪ https://falcosecurity.github.io/falcoctl/index.yaml ▪ The Index file can be managed by a variety of ways • Local file • via HTTP • GCS (Google Cloud Storage)、Amazon S3…etc • Management of actual rule files ◦ Compile Falco rules to a singular file ◦ Hosted on OCI (Open Container Initiative) compatible registries falcosecurity/falcoctl can manage Falco rules and plugins https://github.com/falcosecurity/falcoctl
  6. 13 Mercaris diverse service environment AVG Falco Pod count •

    3500 Monitored Clusters • currently 16 Kinds of container workload • API • ETL • Blockchain • Security • AI…etc
  7. 14 Key challenges for operating Falco at scale Cluster management

    & Registration How can we manage the clusters? Rule management How can to deploy a rule sets? Falco deployment How to deploy Falco into each cluster? Monitoring How can we know if our security monitoring is actually running?
  8. 18 Monitoring Falco Monitoring your security monitoring is also important!

    • active • must collect metrics from all pods • more components to deploy • passive • uses the same pipeline as alerts • simple deployment Poll Push used not used
  9. 20 Organization of rules and exceptions • Cluster ◦ production,

    development and testing • System ◦ GKE • Application ◦ Elasticsearch, ArgoCD, Isito and so on “Categorizing rules and exceptions simplifies management“
  10. 22 Strategy for adding or updating exceptions • To prevent

    detected events from being unintentionally classified as exceptions, exceptions should be defined as narrowly as possible • To ensure the extensibility, make use of lists and macros ◦ If there are multiple elements to be exempted, define the exception condition using a list ◦ Define a macro for complex conditions and reuse it “As narrowly as possible, but to ensure the extensibility”
  11. 25 Strategy for adding or updating exceptions Good example Changed

    to use the = operator to exactly match the command defined in the manifest file. Changed to use the startswith operator to support dynamically generated Pod names.
  12. 26 Falco alert handling Looking only at the rule output,

    it is hard to know who performed the action or why it happened
  13. 27 Falco alert handling • Context from the internal service

    used to request access to the production environment is added ◦ Easily understand the context of manual operations in containers ◦ Shifting to Zero Touch Production • Utilize K8S and Google Cloud audit logs ◦ Identify container.id or project …etc • Information is added to alert tickets using Google Workflows ◦ Detection Engineering and SOAR at Mercari • “Enriching alerts with additional information makes triage easier”
  14. 29 Wrap Up Flexible Falco deployment strategy to support diverse

    cluster environments and CI/CD systems. How to monitor Falco itself Pros and Cons of polling and push approaches Organizing rules into a layered structure using categories Strategy for adding exceptions “As narrowly as possible, but to ensure the extensibility ” Add context to Falco alerts Strategy for Falco Deployment Tricks to effectively use Falco
  15. 31 Appendix • An Introduction to Reverse Engineering for eBPF

    Bytecode ◦ https://engineering.mercari.com/en/blog/entry/20240228-an -introduction-to-reverse-engineering-for-ebpf-bytecode/ • Detection Engineering and SOAR at Mercari ◦ https://engineering.mercari.com/en/blog/entry/20220513-detection- engineering-and-soar-at-mercari/ • Shifting to Zero Touch Production ◦ https://engineering.mercari.com/en/blog/entry/20220126-shifting-to- zero-touch-production/