Slide 1

Slide 1 text

eBPF-based Process Lifecycle Monitoring - Introduction to Tetragon Implementation - Yuki Nakamura March 15, 2025 Cloud Native Community Japan - eBPF Japan Meetup #3

Slide 2

Slide 2 text

Whoami: Yuki Nakamura 👨‍💻 Platform Engineer ex-IBM Group, Mapbox Tech Blog Container & Kubernetes related tools (ArgoCD, Buildkit, etc.) eBPF (Tetragon, Aya) 🐝 My Journey with eBPF Motivated by Documentary: Unlocking the Kernel The story of eBPF code being merged into the Linux Kernel codebase The growing popularity of eBPF Notable quote: "This is like putting Javascript into the kernel." - Brendan Gregg Tetragon Contributing to the Tetragon Project Tetragon-mini: Learning eBPF by rewriting Tetragon in Rust

Slide 3

Slide 3 text

Tetragon Demo: Process Lifecycle Monitoring https://youtu.be/P7Pork8-hp8 0:00

Slide 4

Slide 4 text

Thoughts🤔 It’s impressive, but what is it useful for? What is Tetragon designed to do? How is Process Lifecycle Monitoring implemented? What kernel event hooks are being used? What code is written on the eBPF side and user space side?

Slide 5

Slide 5 text

Agenda 1. Overview of Tetragon and Use Cases Runtime Enforcement Security Observability Analysis of collected Process Lifecycle data 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Tetragon code explanation

Slide 6

Slide 6 text

Questions🙋‍♀️🙋‍♂️ Are you familiar with the Tetragon project? Have you used Tetragon before? Are you using Tetragon in a production environment?

Slide 7

Slide 7 text

Tetragon:Overview CNCF project, subproject of Cilium Written in C (eBPF) and Go (UserSpace), similar to Cilium v1.0 released in November 2023 2023-11-01: v1.0 2024-04-29: v1.1 2024-09-05: v1.2 2024-12-13: v1.3 (The code I’ll introduce today is from v1.3) In one sentence, Tetragon is… eBPF-based Security Observability & Runtime Enforcement Tool

Slide 8

Slide 8 text

Tetragon:Runtime Enforcement A mechanism to instantly control syscalls that match certain rules within kernel space Tracing Policy (rules): Defines kernel events to trace and actions to take when conditions are met Example 1: Kill all sys_write calls attempting to write to /etc/passwd, except those with PID 0 or 1 Example 2: Prohibit execution of specific binary files Using eBPF allows processing to be completed without transferring events to user space. This approach enables low-latency and reliable security policy enforcement. Kernel Event eBPF Map Syscall Event eBPF Program eBPF Program eBPF Program Kill / Override eBPF Program Set up eBPF Programs/Maps Tetragon Agent Tetra CLI Tracing Policy Process eBPF Map eBPF Map

Slide 9

Slide 9 text

Tetragon:Security Observability Real-time observation and analysis of security-related events in the kernel Event examples: File Access, TCP Connection Event, Process Lifecycle (execution/termination), etc. eBPF programs detect events and transfer them to the user space Tetragon Agent via eBPF Maps. Any collector, storage/analytics tool can be used for storage and analysis. Storage/AnalyticsTool Kernel Event eBPF Map Syscall Event eBPF Program eBPF Program eBPF Program eBPF Program Set up eBPF Programs/Maps Tetragon Agent Tetra CLI Tracing Policy Process eBPF Map eBPF Map Grafana loki S3 Collector fluentd optl Athena tetragon.log

Slide 10

Slide 10 text

Analysis 1: Finding Processes with Elevated Privileges Searching for processes with CAP_SYS_ADMIN . 1. Search events in tetragon.log(JSONL) using DuckDB↩︎ [1]

Slide 11

Slide 11 text

Analysis 2: Detecting Suspicious Shell Execution Searching for processes executed from shells. Access paths can also be understood by recursively searching parent processes of the shell.

Slide 12

Slide 12 text

Agenda 1. Overview of Tetragon and Use Case Runtime Enforcement Security Observability Analysis of collected Process Lifecycle 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Process data structure TGID and PID Process Management Syscalls Tetragon code explanation

Slide 13

Slide 13 text

Linux Basics: task_struct task_struct is the data structure in the Linux kernel that manages each process (or thread) Linux: include/linux/sched.h Tetragon collects process information from this task_struct eBPF helper function bpf_get_current_task() : Gets a pointer to the task_struct for the current process (thread) struct task_struct { pid_t pid; pid_t tgid; char comm[TASK_COMM_LEN]; // プロセスのコマンド名 struct nsproxy *nsproxy; // Namespace struct mm_struct *mm; // プロセスが使用するユーザ空間のメモリ管理情報へのポインタ ... struct task_struct *task = bpf_get_current_task(); get_namespaces(&event->ns, task); // Get namespace information from task_struct. get_namespace is a function defined wit

Slide 14

Slide 14 text

Linux: TGID and PID TGID is the process identifier. PID is the thread identifier. Multi Thread task_struct tgid 200 pid 200 comm binary_2 task_struct tgid 200 pid 201 comm binary_2 task_struct tgid 200 pid 202 comm binary_2 Single Thread task_struct tgid 100 pid 100 comm binary_1 ⚠️PID is not a process identifier⚠️ Tetragon monitors events at the process level. Creation/deletion of threads (multithreading) within a process is ignored. eBPF helper function bpf_get_current_pid_tgid() : Gets pid and tgid u64 pid_tgid = bpf_get_current_pid_tgid(); u32 tgid = pid_tgid >> 32; // Get TGID from upper 32 bits u32 pid = pid_tgid & 0xFFFFFFFF; // Get PID from lower 32 bits

Slide 15

Slide 15 text

Linux: Process Management Syscall Processes are created, executed, and terminated through the following steps: 1. The parent process creates a child process by calling fork() , clone() , or similar syscalls. 2. The child process executes a program by calling execve() or similar syscalls. 3. When execution is complete, the child process terminates by calling exit() or similar syscalls. Parent Child fork() execve() exit() wait()

Slide 16

Slide 16 text

Example: Syscalls During ls Command Execution The syscalls traced when bash executes the ls command are as follows. Terminal1 (bash, PID=23167) Terminal2 The following syscalls were called: 1. Creation: clone() 2. Execution: execve() 3. Termination: exit_group() ls -la strace -fp 23167 2>&1 | grep -e clone -e execve -e exit clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 23895 attached [pid 23895] execve("/usr/bin/ls", ["ls", "--color=auto", "-la"], 0xbb5be6b20cc0 * 24 vars /) = 0 [pid 23895] exit_group(0) = ? [pid 23895] +++ exited with 0 +++

Slide 17

Slide 17 text

Agenda 1. Overview of Tetragon and Use Case Runtime Enforcement Security Observability Analysis of collected Process Lifecycle 2. Process Lifecycle Monitoring Mechanism Linux prerequisite knowledge Process data structure TGID and PID Process Management Syscalls Tetragon code explanation Fork Execve Exit

Slide 18

Slide 18 text

Process Lifecycle Monitoring Demo (Review)

Slide 19

Slide 19 text

Tetragon: Process Lifecycle Monitoring - Overview Attach eBPF programs that create Process Lifecycle Events to hooks of each Process Management Syscall. The eBPF programs transfer the created events to user space via eBPF Maps. User Space Kernel Space Exit-related Syscall eBPF Map Fork-related Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent

Slide 20

Slide 20 text

Fork: eBPF Program and Hook Point The eBPF Program with section name: kprobe/wake_up_new_task is attached to the kprobe of wake_up_new_task Tetragon UserSpace: base.go User Space Kernel Space Tetragon Agent fork-related syscalls event_wake_up_new_task kprobe wake_up_new_task perf_event_array tcpmon_map 47 Fork = program.Builder( 48 "bpf_fork.o", // the name of the BPF object file 49 "wake_up_new_task", // the hook point 50 "kprobe/wake_up_new_task", // the program section name 51 "kprobe_pid_clear", // the name of pin 52 "kprobe", // the type of BPF program 53 ).SetPolicy(basePolicy)

Slide 21

Slide 21 text

Fork: Hook Point (wake_up_new_task) wake_up_new_task is a function called within kernel_clone(), which is the main routine of fork Linux: kernel/fork.c -> kernel_clone() pid_t kernel_clone(struct kernel_clone_args *args) { struct task_struct *p; wake_up_new_task(p); ...

Slide 22

Slide 22 text

Fork: eBPF Program Assembles the Clone Event and writes it to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_fork.c Mechanism to ignore thread creation and deletion within processes: Since wake_up_new_task is also called when creating a thread, it checks if a Clone Event has already been created with the same TGID, and only creates a new one if it hasn’t been created yet. curr = execve_map_get(tgid); if (curr->key.ktime != 0) // Check whether the event for the tgid has already been created. return 0; ``` --> 23 __attribute__((section("kprobe/wake_up_new_task"), used)) int 24 BPF_KPROBE(event_wake_up_new_task, struct task_struct *task) 25 { 26 struct msg_clone_event msg; 27 ... 28 perf_event_output_metric(ctx, MSG_OP_CLONE, &tcpmon_map, 29 BPF_F_CURRENT_CPU, &msg, msg_size); // Write msg_clone_event to tcpmon_map

Slide 23

Slide 23 text

Execve User Space Kernel Space Exit-related Syscall eBPF Map Fork-related Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent

Slide 24

Slide 24 text

Execve: eBPF Program and Hook Point The eBPF Program with section name: tracepoint/sys_execve is attached to the tracepoint sched/sched_process_exec . Tetragon UserSpace: base.go Kernel Space Tail Call event_execve execve_send Tail Call execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map 23 Exit = program.Builder( 24 config.ExecObj(), // the name of the BPF object file 25 "sched/sched_process_exec", // the hook point 26 "tracepoint/sys_execve", // the program section name 27 "event_execve", // the name of pin 28 "execve", // the type of BPF program 29 ).SetPolicy(basePolicy)

Slide 25

Slide 25 text

Execve: Hook Point (sched/sched_process_exec) The tracepoint: sched/sched_process_exec is triggered when a new process is executed. Mechanism to ignore thread creation and deletion within processes: When a thread is created within a process, sched/sched_process_exec is not triggered. Reference: When writing eBPF Programs for tracepoints, it’s important to check the format of data available in that tracepoint. This can be confirmed using the following command. cat /sys/kernel/debug/tracing/events/sched/sched_process_exec/format name: sched_process_exec ID: 267 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__data_loc char[] filename; offset:8; size:4; signed:0; field:pid_t pid; offset:12; size:4; signed:1; field:pid_t old_pid; offset:16; size:4; signed:1; print fmt: "filename=%s pid=%d old_pid=%d", __get_str(filename), REC->pid, REC->old_pid

Slide 26

Slide 26 text

Execve: eBPF Program Assembles the Execve Event (msg_exit) and writes it to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_execve_event.c event_execve execve_send __attribute__((section("tracepoint/sys_execve"), used)) int event_execve(struct trace_event_raw_sched_process_exec *ctx) { struct task_struct *task = (struct task_struct *)get_current_task(); char *filename = (char *)ctx + (_(ctx->__data_loc_filename) & 0xFFFF); // Use __data_loc_filename in ctx struct msg_execve_event *event; __attribute__((section("tracepoint"), used)) int execve_send(void *ctx __arg_ctx) { // Write msg_execve_event to tcpmon_map perf_event_output_metric(ctx, MSG_OP_EXECVE, &tcpmon_map, BPF_F_CURRENT_CPU, event, size);

Slide 27

Slide 27 text

Execve: Tail Call Execve Event processing uses three sequential eBPF Programs connected by Tail Calls 1. event_execve: Assembles the Execve Event 2. execve_rate: Suppresses monitoring when a large number of events occur per cgroup(Event throttling) 3. execve_send: Writes the Execve Event to the eBPF Map Kernel Space Tail Call event_execve execve_send Tail Call execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map Benefits of introducing Tail Call: Separation of logic Avoiding eBPF Verifier’s program size limitation Reduction of stack usage (maximum 512 bytes) 1. Until v5.2, the instruction limit was 4k and the complexity limit was 128k.Afterwards, these limits were raised to 1M.↩︎ [1]

Slide 28

Slide 28 text

Tips: Data Sharing Between Tail Calls Data cannot be passed when making a Tail Call. As a solution to this, eBPF Maps are used to share Event data between eBPF Programs. Tetragon eBPF: process.h Kernel Space Tail Call read/write event_execve read/write execve_send Tail Call read/write execve_rate execve-related syscalls trecepoint sched_process_exec User Space Tetragon Agent perf_event_array tcpmon_map Storage for sharing states PerCpuArray CPU:1 CPU:n PerCpuArray ... 360 struct { 361 __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); 362 __uint(max_entries, 1); 363 __type(key, __u32); 364 __type(value, struct msg_execve_event); 365 } execve_msg_heap_map SEC(".maps");

Slide 29

Slide 29 text

Exit User Space Kernel Space Exit-related Syscall eBPF Map Fork-related Syscall eBPF Program for creating Clone Event Execve-related Syscall eBPF Program for creating Exit Event eBPF Program for creating Execve Event Tetragon Agent

Slide 30

Slide 30 text

Exit: eBPF Program and Hook Point The eBPF Program with section name: kprobe/acct_process is attached to the kprobe of acct_process Tetragon UserSpace: base.go User Space Kernel Space Tetragon Agent exit-related syscalls event_exit_acct_process kprobe acct_process perf_event_array tcpmon_map 39 Exit = program.Builder( 40 "bpf_exit.o", // the name of the BPF object file 41 "acct_process", // the hook point 42 "kprobe/acct_process", // the program section name 43 "event_exit", // the name of pin 44 "kprobe", // the type of BPF program 45 ).SetPolicy(basePolicy)

Slide 31

Slide 31 text

Exit: Hook Point (acct_process) The acct_process function is called within do_exit() when a Thread Group is removed Linux: kernel/exit.c -> do_exit() Mechanism to ignore thread creation and deletion within processe: acct_process runs only once when a process terminates. For kernels without acct_process, disassociate_ctty is used instead. Reference: Previously, the tracepoint sched/sched_process_exit or kprobe kprobe/__put_task_struct was used. tetragon: Switch exit tracepoint to __put_task_struct kprobe #558 tetragon: Hook exit sensor on acct_process #1509 void __noreturn do_exit(long code) { if (group_dead) acct_process();

Slide 32

Slide 32 text

Exit: eBPF Program Assembles the Exit Event (msg_exit) and writes it to the eBPF Map: tcpmon_map Tetragon eBPF: bpf_exit.c kprobe/acct_process section Tetragon eBPF: bpf_exit.h 47 __attribute__((section("kprobe/acct_process"), used)) int 48 event_exit_acct_process(struct pt_regs *ctx) 49 { 50 __u64 pid_tgid = get_current_pid_tgid(); 51 52 event_exit_send(ctx, pid_tgid >> 32); 53 return 0; 54 } FUNC_INLINE void event_exit_send(void *ctx, __u32 tgid) { struct msg_exit *exit; exit->info.tid = tgid; ... perf_event_output_metric(ctx, MSG_OP_EXIT, &tcpmon_map, BPF_F_CURRENT_CPU, exit, size); // Write msg_exit to tcpmon_map

Slide 33

Slide 33 text

Process Lifecycle Monitoring - Detailed Implementation Process Lifecycle Monitoring is achieved using 3 hook points, 5 eBPF programs, and multiple eBPF maps User Space Kernel Space exit-related syscalls Tail Call event_execve perf_event_array tcpmon_map Tetragon Agent Tetra CLI fork-related syscalls event_exit_acct_process execve_send Tail Call execve_rate event_wake_up_new_task execve-related syscalls kprobe wake_up_new_task trecepoint sched_process_exec kprobe acct_process

Slide 34

Slide 34 text

Wrap up 1. Explained Tetragon’s Runtime Enforcement and Security Observability 2. Explained Linux fundamentals (task_struct, TGID and PID, Process-related Syscalls) 3. Introduced portions of Tetragon/Kernel code (Hook Points and eBPF Programs for Fork/Execve/Exit) 4. Presented eBPF tips (Tracepoint Data Format, Tail Call, data sharing using Per-CPU Maps)