AuthZed Office Hours: Perf & Load Testing for SpiceDB

Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Welcome!   I’m Verónica López Software Engineer @ AuthZed, Co-tech lead of SIG-Release @ Kubernetes

Slide 3

Slide 3 text

Overview Where do we go from here? The role of perf & load testing in authorization systems. Why k6 How Fast vs How Well 01 05 04 02 The Good, the Bad & the Ugly 06 03 The Problem Tools Analysis Benchmark vs. Perf Conclusions Real World Scenarios

Slide 4

Slide 4 text

The Problem The role of perf & load testing in authorization systems. 01

Slide 5

Slide 5 text

Origin: releases From my experience releasing large distributed systems -including databases- : If you want to learn the nitty-gritty details of a distributed system, get familiar with the release process. Release process =! Deployment

Slide 6

Slide 6 text

Origin: releases Identifying bottlenecks in release processes helps to understand whether you need a) more tests b) communication c) automation d) address tech debt, etc.

Slide 7

Slide 7 text

Origin: releases Warning: you’ll need to be consistent and patient. Science trained me for this 🧪✨

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

“Unrelated” changes, scaling, etc. Common Pitfalls Prod Most of the gnarly issues only show up in production. Folks test in “hello world” scenarios. DBs Relationships & interactions Infra 1 2 3

Slide 10

Slide 10 text

● Identify regressions ● Highlight bottlenecks ● Understand trade-offs ● Spend less time ﬁxing; more time solving problems to fulﬁll our mission Tests for Release Stability

Slide 11

Slide 11 text

perf & load testing in authorization systems ensures that frequent permission checks remain fast and stable under realistic workloads. It provides continuous feedback on the impact of code changes, ensuring that any performance regressions are caught immediately. This helps identify even minor slowdowns that can degrade dev experience or expose security gaps

Slide 12

Slide 12 text

Benchmarks vs. Perf How Fast vs How Well 02

Slide 13

Slide 13 text

Benchmarks vs. Perf   Sometimes, engineers mistakenly believe benchmarks and performance/load tests are equivalent since both measure metrics such as response time and throughput Benchmarks ● Isolated code paths in controlled environments ● Measures raw speed (latency, throughput) ● Offers a best-case performance snapshot Perf/Load ● Simulates real-world concurrency and multi-service interactions ● Monitors system stability, error rates, and resource usage ● Identiﬁes bottlenecks and regressions under stress

Slide 14

Slide 14 text

Benchmarks vs. Perf   Examples! Benchmarks Measuring the latency of a single permission check on an optimized Postgres instance or Running an isolated query against SpiceDB to record the maximum throughput (queries-per-second) when no other operations are active. Perf/Load Simulating 1,000 concurrent checks across multiple resources to capture latency distributions and identify bottlenecks. or Stress-testing cache invalidation by rapidly updating policies and then immediately querying permissions, exposing real-world delays.

Slide 15

Slide 15 text

Benchmarks vs. Perf   TL;DR Benchmarks How Fast Perf/Load How Well

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

Real World Let’s play a little game ✨ 03

Slide 18

Slide 18 text

Benchmark or Perf/load?

Slide 19

Slide 19 text

DISCLAIMER! The code you are about to see is an exercise to illustrate the ideas shared on this presentation, simpliﬁed for brevity. These snippets aren’t comprehensive enough to describe the full capabilities of SpiceDB.

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

This test uses a ramping-arrival-rate executor to simulate increasing load, reaching 75 queries per second. The positive_checking function picks a random relationship from the dataset and performs a single permission check, isolating the operation to provide a clear measurement of raw latency and throughput under controlled conditions.

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

This function simulates a permission check that involves group-based logic. It calls client.invoke to send a gRPC request to the PermissionsService/CheckPermission endpoint. The request includes data specifying: The resource, the permission to check & the subject. The test alternates between two scenarios: evenly distributes different workloads across the user pool, simulating varied real-world interactions.

Slide 24

Slide 24 text

No content

Slide 25

Slide 25 text

This test uses a ramping-arrival-rate executor to simulate increasing load on write operations. It generates a random batch of updates to simulate concurrent writes and veriﬁes that each batch is processed successfully. The threshold ensures that 95th percentile write latency stays below 2000ms, capturing key performance metrics under stress.

Slide 26

Slide 26 text

Tooling Why K6 04

Slide 27

Slide 27 text

● Virtual Users; can be ramped up, down. ● JS or TS 😛 ● built-in support for detailed metrics (e.g., latency distributions, error rates, throughput), but you can also add your own. Good for diagnosing performance regressions. ● Easy to understand tradeoffs ● Open Source ❤ Why K6

Slide 28

Slide 28 text

Analysis Where do we go from here? 05

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

perf & load testing Result interpretation requires deep analysis to differentiate between transient performance anomalies -or trade-offs!- and genuine regressions. We often need expert tuning and contextual understanding of production behaviours.

Slide 31

Slide 31 text

perf & load testing Sometimes, the actual goal of these test can be to ﬁnd the breaking point of your software, and plan around it: do we ﬁx it? Do we work around it? How do we make sure we don’t hit those values in production? How the system degrades under load. Are we ok with it?

Slide 32

Slide 32 text

No content

Slide 33

Slide 33 text

DBs & Kubernetes enter the chat

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

We can write tests addressing the nuances of each database. Simulate Kubernetes workﬂows : what happens if these pods die while X amount of users are performing these checks, etc.

Slide 36

Slide 36 text

Conclusions 06

Slide 37

Slide 37 text

Conclusions Perf & load tests as a means to reach more stable releases. Setting up an environment that faithfully replicates prod* conditions and interpret the data to drive actionable insights. Know your system. Forget the testers vs. devs mentality. Benchmarks vs. Performance & load

Slide 38

Slide 38 text

Conclusions If your tests are really providing signal vs. noise,they need to be alive: constantly tweaking them based on real world scenarios with customers. Every post-mortem (or equivalent) is an opportunity to improve the project. Be patient.

Slide 39

Slide 39 text

Thank you! 󰚤✨