Slide 1

Slide 1 text

eBPF Can Do It! A 5-Minute Tour of 5 Real-World PHP Issues Solved with eBPF 2026/05/26 Laravel Live Japan Sohei Iwahori (GREE, Inc.)

Slide 2

Slide 2 text

Agenda • What is eBPF? • Cases • #1 Memcached Performance Issue Diagnosis • #2 Dead Code Detection • #3 Inspecting C extension behavior • #4 Measuring the time taken for legacy batch jobs • #5 Long-Term Internal Metrics • Recap

Slide 3

Slide 3 text

What is eBPF?

Slide 4

Slide 4 text

eBPF Overview1 1 What is eBPF? https://ebpf.io/ja/what-is-ebpf/.

Slide 5

Slide 5 text

eBPF Overview1 1 What is eBPF? https://ebpf.io/ja/what-is-ebpf/.

Slide 6

Slide 6 text

Cases

Slide 7

Slide 7 text

#1 Memcached Performance Issue Diagnosis

Slide 8

Slide 8 text

Problem

Slide 9

Slide 9 text

bpftrace one-liner $ sudo bpftrace -e 'uprobe:/usr/lib/x86_64-linux-gnu/libmemcached.so.11.0.0:memcached_set { printf("----");time(); printf("key_length: %d\nkey: %s\n", arg2, str(arg1)); printf("val_length: %d\nval: %s\n", arg4, str(arg3) );}'

Slide 10

Slide 10 text

What the actual problem was • Huge session key mapping object created by the FW • The object is updated every time items are stored • So.. We just stop using it!

Slide 11

Slide 11 text

#2 Dead Code Detection

Slide 12

Slide 12 text

Problem(caused by dead code) • Barriers to version upgrades • Slow deployment process • Extremely hard to tell whether code is actually unused • Not only web but also batch jobs, etc. • Some jobs might run once a month or more

Slide 13

Slide 13 text

php-dcr https://github.com/egmc/php-dcr2 2 https://github.com/egmc/php-dcr

Slide 14

Slide 14 text

php-dcr Overview (Data Flow) php-dcr (Go) Linux Kernel PHP Processes USDT probe USDT probe USDT probe BPF Map Scan HTTP Apache (mod_php) PHP-FPM PHP CLI eBPF USDT: compile__file__return → Record file path + timestamp BPF map reader (polls every 5 seconds) HTTP API :8080 /v1/report, /v1/stats Target directory *.php External client

Slide 15

Slide 15 text

How php-dcr Works - Sample App

Slide 16

Slide 16 text

How php-dcr Works - Sample App

Slide 17

Slide 17 text

How php-dcr Works - Run php-dcr # php-dcr --target-dir="/var/www/sample/laravel-sample- todo/app" --emit-log-events

Slide 18

Slide 18 text

How php-dcr Works - Via HTTP

Slide 19

Slide 19 text

How php-dcr Works - Via CLI

Slide 20

Slide 20 text

How php-dcr Works - View report

Slide 21

Slide 21 text

How php-dcr Works - View report

Slide 22

Slide 22 text

How php-dcr Works - Stats

Slide 23

Slide 23 text

How php-dcr Works - Otel Logs

Slide 24

Slide 24 text

#3 Inspecting C extension behavior

Slide 25

Slide 25 text

Problem • Storing data in APCu fails even when there is enough memory available • We hit this issue in some apps in production, caused issues like hitting old cache • apcu_store should return false on failure

Slide 26

Slide 26 text

Let's check it out ☹ ... apcu_store($cache_key, array('time' => $time, 'data' => $value), 0); ...

Slide 27

Slide 27 text

bpftrace with uretprobe Still we can see the return value on the APCu extension side with eBPF # bpftrace -e 'uretprobe:/usr/lib/php/20190902/apcu.so:apc_cache_store {printf ("%d\n", retval)}' 1 1 1 0 ...

Slide 28

Slide 28 text

What happened? • APCu had a bug where storing data required contiguous free memory space, but the implementation was only checking the total available size rather than verifying that a contiguous block was actually available3 • This meant APCu could determine there was sufficient space and proceed with the store operation, only to fail afterward when no contiguous region large enough was available. • This issue has been fixed in v5.1.25. • Use fixed version! 3 https://github.com/krakjoe/apcu/pull/532

Slide 29

Slide 29 text

#4 Measuring the time taken for legacy batch jobs

Slide 30

Slide 30 text

Problem - Added latency during migration Before migration Network location A App / API Batch jobs DB Low latency (co-located) Migrating During migration to location B Network location A App / API Batch jobs Network location B (target) DB +1.6 ms latency

Slide 31

Slide 31 text

Problem - Added latency during migration Before migration Network location A App / API Batch jobs DB TC command inject delay Co-located: delay added artificially to simulate the move Migrating During migration to location B Network location A App / API Batch jobs Network location B (target) DB +1.6 ms latency

Slide 32

Slide 32 text

Web latency is easy, but how do we check batch jobs?

Slide 33

Slide 33 text

bpftrace(script) again #!/usr/bin/bpftrace --unsafe #ifndef BPFTRACE_HAVE_BTF #include #endif BEGIN { printf("Starting process monitoring...\n"); printf("%-8s %-20s %-8s %-8s %-8s %s\n", "TIME", "EVENT", "PID", "ELAPSED_SEC", "COMMAND", "CODE"); } tracepoint:sched:sched_process_exec /comm == "php"/ { $pid = pid; // not available with kernel 5.15.0 //printf("cmdpath: %s", args->filename); // Update map with process ID as key and current time as value @start_time[$pid] = nsecs; time("%Y-%m-%d %H:%M:%S "); // Get the command line (read from /proc//cmdline) printf("%-20s %-8d %-8s %s / ", "EXEC", $pid, "", comm); // Get command line arguments system("tr '\\0' ' ' pid == $task->tgid) { // If the map entry exists for this process ID, retrieve the time if (@start_time[pid]) { $duration_ns = nsecs - @start_time[pid]; $duration_sec = $duration_ns / 1000000000; time("%Y-%m-%d %H:%M:%S "); printf("%-20s %-8d %-8d %s %d", //printf("%d %-7d exit(%d) %s (execution time: %d seconds)\n", "EXIT", pid, $duration_sec, comm, $task->exit_code >> 8 ); printf("\n"); // Delete the map entry on exit delete(@start_time[pid]); } } } END { printf("\nStopping monitoring\n"); // Clear any remaining map entries clear(@start_time); }

Slide 34

Slide 34 text

Result

Slide 35

Slide 35 text

Result

Slide 36

Slide 36 text

#5 Long-Term Internal Metrics

Slide 37

Slide 37 text

ebpf_exporter • cloudflare/ebpf_exporter4 • Prometheus exporter for custom eBPF metrics and OpenTelemetry traces • Collect metrics with a small custom eBPF program and YAML configuration 4 https://github.com/cloudflare/ebpf_exporter

Slide 38

Slide 38 text

How It Works

Slide 39

Slide 39 text

eBPF Code and YAML Example - name: php_request_time_sec help: PHP ELAPSED TIME SEC from startup to shutdown bucket_type: exp2 bucket_min: 0 bucket_max: 27 bucket_multiplier: 0.000001 # microseconds to seconds labels: - name: request_uri size: 256 decoders: - name: string - name: request_method size: 8 decoders: - name: string - name: bucket size: 8 decoders: - name: uint SEC("usdt//usr/lib/apache2/modules/libphp8.1.so:php:request__startup") int BPF_USDT(request_startup, char *arg0, char *arg1, char *arg2) { u64 ts = bpf_ktime_get_ns(); u32 pid = bpf_get_current_pid_tgid(); struct php_req_key key; key.pid = pid; bpf_probe_read_user_str(&key.request_uri, sizeof(key.request_uri), arg1); bpf_probe_read_user_str(&key.request_method, sizeof(key.request_method), arg2); bpf_map_update_elem(&php_req, &pid, &ts, BPF_ANY); return 0; } SEC("usdt//usr/lib/apache2/modules/libphp8.1.so:php:request__shutdown") int BPF_USDT(request_shutdown, char *arg0, char *arg1, char *arg2) { u64 *tsp, delta_us, ts = bpf_ktime_get_ns(); u32 pid = bpf_get_current_pid_tgid(); struct php_req_hist_key_t hist_key = {}; bpf_probe_read_user_str(&hist_key.request_uri, sizeof(hist_key.request_uri), arg1); bpf_probe_read_user_str(&hist_key.request_method, sizeof(hist_key.request_method), arg2); tsp = bpf_map_lookup_elem(&php_req, &pid); if (!tsp) { return 0; } delta_us = (ts - *tsp) / 1000; increment_exp2_histogram(&php_request_time_sec, hist_key, delta_us, MAX_SLOT_PHP_REQ); return 0; }

Slide 40

Slide 40 text

PHP Processing Latency Histogram

Slide 41

Slide 41 text

Memcached Value Length Histogram

Slide 42

Slide 42 text

Compile Event Count (per Directory)

Slide 43

Slide 43 text

Exception Count

Slide 44

Slide 44 text

Recap • eBPF can be used for solving issues in the PHP world • For ad-hoc purposes • bpftrace • For long-term solutions • ebpf_exporter • dedicated tools • Write your own!

Slide 45

Slide 45 text

Thank you for listening!

Slide 46

Slide 46 text

About Me • Sohei Iwahori • X/bsky/GitHub: @egmc • Senior Lead Engineer at GREE, Inc. • Leading Monitoring Unit • Community • eBPF Japan Meetup

Slide 47

Slide 47 text

No content