Slide 1

Slide 1 text

Conditional Authorization for Kubernetes Lucas Käldström, Upbound Kubernetes SIG Auth Community Meeting 2025-06-04 @luxas.dev

Slide 2

Slide 2 text

Contributor Since 2015 MSc student Since 2019 Staff Software Engineer Since 2024 @luxas.dev

Slide 3

Slide 3 text

@luxas.dev

Slide 4

Slide 4 text

TL;DR; Unified UX across authz & admission Unified UX across admission reads & writes ⇒ Conditional Authorization @luxas.dev

Slide 5

Slide 5 text

Note: Focus only on admission that relates to the principal @luxas.dev

Slide 6

Slide 6 text

Inspiration

Slide 7

Slide 7 text

Expressive

Slide 8

Slide 8 text

Fast Expressive

Slide 9

Slide 9 text

Fast Safe Expressive

Slide 10

Slide 10 text

Fast Safe Analyzable Expressive

Slide 11

Slide 11 text

Fast Safe Analyzable Expressive Correct

Slide 12

Slide 12 text

Fast Safe Analyzable Expressive Correct RBAC Webhooks VAP UX gap in authorization Turing-completeness wall

Slide 13

Slide 13 text

Fast Safe Analyzable Expressive Correct RBAC Webhooks VAP UX gap in authorization Turing-completeness wall In cloud providers Not dependable upon

Slide 14

Slide 14 text

“Over-grant” in RBAC, deny in VAP RBAC Role Allow in authorization RBAC Role Binding Allow in authorization CEL Policy Deny in admission create, update, delete gateways — object=* oldobject=* .class != ‘test-gateway’ .class == ‘test-gateway’ Kubernetes RBAC CEL Rule Amount of permissions for lucas Desired permissions

Slide 15

Slide 15 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false @luxas.dev

Slide 16

Slide 16 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false - Ingress/Gateway: Only listwatch .type=tls Secrets and/or labelled ones @luxas.dev

Slide 17

Slide 17 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false - Ingress/Gateway: Only listwatch .type=tls Secrets and/or labelled ones - Class: Only some users are allowed to use certain classes @luxas.dev

Slide 18

Slide 18 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false - Ingress/Gateway: Only listwatch .type=tls Secrets and/or labelled ones - Class: Only some users are allowed to use certain classes - Node bounds: Agent to only see CRD with .nodeName= @luxas.dev

Slide 19

Slide 19 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false - Ingress/Gateway: Only listwatch .type=tls Secrets and/or labelled ones - Class: Only some users are allowed to use certain classes - Node bounds: Agent to only see CRD with .nodeName= - CSR signers: Some users can only certain signers (today compound authz) @luxas.dev

Slide 20

Slide 20 text

Examples of logically unified authz + admission - DRA Admin Access: allow, when old/new object has adminAccess == false - Ingress/Gateway: Only listwatch .type=tls Secrets and/or labelled ones - Class: Only some users are allowed to use certain classes - Node bounds: Agent to only see CRD with .nodeName= - CSR signers: Some users can only certain signers (today compound authz) - Impersonation: Predicate on full UserInfo shape that can be impersonated @luxas.dev

Slide 21

Slide 21 text

These all boil down to label and field predicates @luxas.dev

Slide 22

Slide 22 text

These all boil down to label and field predicates @luxas.dev The solution I present here, conditional authorization, is a superset of the “selectors for all verbs” KEP idea

Slide 23

Slide 23 text

Fast Safe Analyzable Expressive RBAC Webhooks VAP UX gap in authorization Turing-completeness wall Correct In cloud providers Not dependable upon

Slide 24

Slide 24 text

Fast Safe Analyzable Expressive Correct RBAC Webhooks VAP Turing-completeness wall Analyzability wall In cloud providers Not dependable upon

Slide 25

Slide 25 text

Fast Safe Analyzable Expressive Correct RBAC Webhooks VAP Turing-completeness wall Analyzability wall NEW In cloud providers Not dependable upon

Slide 26

Slide 26 text

Analyzability wall? @luxas.dev n variables, m statements Propositional Logic in SAT First-order Logic in SMT Variable data types Booleans only Booleans, Strings, Ints, Objects, Arrays, Sets, Functions, etc. Operators =, ¬, ⋀, ⋁, ⟹, ⟺ PL and e.g. +, -, <, has, etc. Quantifiers None forall (∀) and exists (∃) Decidable Yes Without quantifiers only* * Church-Turing theorems 1937 ⇒ No loops and quantifiers to keep analyzability

Slide 27

Slide 27 text

Why keep analyzability? Partially order permissiveness of policy set ⇒ Check for logical inconsistencies in a policy set (shadowing, no-ops) ⇒ Check for equality (help refactors) ⇒ Prevent privilege escalation (in kube-api-server) ← No effect ← Allow shadows allow ← Deny shadows allow Allow policy Deny policy @luxas.dev

Slide 28

Slide 28 text

Thankfully, CEL should mostly map to analyzable SMT CEL Analyzable SMT e.all e.exists e.exists_one e.map e.filter str.matches str.contains str.startsWith str.endsWith uint int bool string bytes* null map** list double has size*** * not sure, didn’t dig very deep ** map of string values should work, but is a bit of work *** might need some extra work () . [] {} - (unary) ! * / % + - (binary) == != < > <= >= in && || ?: uninterpreted functions + probably more Some SMT solvers also support more advanced features, like regexps, but they might not always be analyzable for all cases, or standardized in the SMT-LIB standard. Disclaimer: I’m not an expert

Slide 29

Slide 29 text

Strive to keep it simple The logic analyzability constraint offers a guideline on how much expressiveness is reasonable at the authorization layer, and what is not. Loops, quantifiers, full-blown regexps, and such would anyways be “overkill” * From a SMT solver like Z3 or cvc5 @luxas.dev

Slide 30

Slide 30 text

Strive to keep it simple The logic analyzability constraint offers a guideline on how much expressiveness is reasonable at the authorization layer, and what is not. Loops, quantifiers, full-blown regexps, and such would anyways be “overkill” But even all decidable SMT might not be needed, e.g. we probably don’t want floats as part of an authz decision. * From a SMT solver like Z3 or cvc5 @luxas.dev

Slide 31

Slide 31 text

Strive to keep it simple The logic analyzability constraint offers a guideline on how much expressiveness is reasonable at the authorization layer, and what is not. Loops, quantifiers, full-blown regexps, and such would anyways be “overkill” But even all decidable SMT might not be needed, e.g. we probably don’t want floats as part of an authz decision. If we maintain an encoding into decidable SMT, we get analyzability “for free”* * From a SMT solver like Z3 or cvc5 @luxas.dev

Slide 32

Slide 32 text

Strive to keep it simple The logic analyzability constraint offers a guideline on how much expressiveness is reasonable at the authorization layer, and what is not. Loops, quantifiers, full-blown regexps, and such would anyways be “overkill” But even all decidable SMT might not be needed, e.g. we probably don’t want floats as part of an authz decision. If we maintain an encoding into decidable SMT, we get analyzability “for free”* This subset of CEL might also map “just right” into the efficient-ish SQL encoding we want for the new unified selector syntax. * From a SMT solver like Z3 or cvc5 @luxas.dev

Slide 33

Slide 33 text

Fast Safe Analyzable Expressive Correct RBAC Webhooks VAP Turing-completeness wall Analyzability wall NEW In cloud providers Not dependable upon

Slide 34

Slide 34 text

Conditional Authorization for Write requests @luxas.dev

Slide 35

Slide 35 text

Unify policy authoring for both authorization and admission Authorization RequestInfo UserInfo Kubernetes Path for write request Project Partial Evaluation Yes, No, Maybe 403 Policies Webhook Authentication @luxas.dev

Slide 36

Slide 36 text

Partial Evaluation: Work with incomplete data Even though the request object is not decoded at authorization time, we can resolve everything else based on request attributes. We might get unconditional allow/deny already! Or then, a “residual” expression over the unknown object CEL might be able to do this to some degree, but didn’t test it yet.

Slide 37

Slide 37 text

Unify policy authoring for both authorization and admission Authorization RequestInfo UserInfo Kubernetes Path for write request Project Partial Evaluation Yes, No, Maybe 403 Policies Webhook Authentication @luxas.dev

Slide 38

Slide 38 text

Unify policy authoring for both authorization and admission Authorization RequestInfo UserInfo Admission Control RequestInfo UserInfo Body Body Kubernetes Path for write request Project Partial Evaluation Yes, No, Maybe Full Evaluation Yes, No 403 403 Policies Webhooks Authentication Storage @luxas.dev

Slide 39

Slide 39 text

Condition addition to SubjectAccessReview Allow a SAR client to opt-in to conditional authz through feature flags and e.g. an annotation on the SAR SAR server responds either “Allow + condition” or “Conditional + condition” ⇒ Means that SAR server might want to loop until the end, to try to find unconditional allow, instead of short-circuiting on a conditional. Moreover, multiple conditions might be ORed Note: Today, this all can be handled out of core! But it’d be neater to have: a) Kube API server enforce the condition, instead of requiring catch-all admission b) (Possibly in the long road) Some core API that can be “depended upon” @luxas.dev

Slide 40

Slide 40 text

(Cluster)RoleBinding (Cluster)Role ValidatingAdmissionPolicy Input Username, Group (Namespace) RoleRef APIGroup CombinedResource (Name) (Namespace) Username, Group UID, User Extra GVR Subresource Name Namespace GVK New + Old Object Ns Object Authorizer Operators == ==, In ==, != In, NotIn Prefix, Suffix Expression Fixed Fixed Arbitrary Scope Subject Object Object Applicability Reads, Writes, SAR, Custom Reads, Writes, SAR, Custom Writes Current state

Slide 41

Slide 41 text

(Cluster)RoleBinding (Cluster)Role ValidatingAdmissionPolicy UnifiedAuthorization Input Username, Group (Namespace) RoleRef APIGroup CombinedResource (Name) (Namespace) Username, Group UID, User Extra GVR Subresource Name Namespace GVK New + Old Object Ns Object Authorizer Username, Group UID, User Extra APIGroup CombinedResource Name (!) (Namespace) (GVK) (New + Old Object) (Ns Object) Operators == ==, In ==, != In, NotIn Prefix, Suffix ==, != In, NotIn Prefix, Suffix Expression Fixed Fixed Arbitrary Arbitrary Scope Subject Object Object Object Applicability Reads, Writes, SAR, Custom Reads, Writes, SAR, Custom Writes Reads, Writes, SAR, Custom Example with one unified authz + admission API

Slide 42

Slide 42 text

ConditionalRoleBinding ConditionalRole ValidatingAdmissionPolicy Input Username, Group UID, User Extra (Namespace) RoleRef Username, Group UID, User Extra APIGroup CombinedResource Name (Namespace) (GVK) (New + Old Object) (Ns Object) Username, Group UID, User Extra GVR GVK Subresource Name Namespace New + Old Object Ns Object Authorizer Operators ==, != In, NotIn Prefix, Suffix ==, != In, NotIn Prefix, Suffix ==, != In, NotIn Prefix, Suffix Expression Arbitrary Arbitrary Arbitrary Scope Subject Object Object Applicability Reads, Writes, SAR, Custom Reads, Writes, SAR, Custom Writes Example if we want a two-layer model

Slide 43

Slide 43 text

Conditional Authorization for Read requests @luxas.dev

Slide 44

Slide 44 text

UX for matching label and field selectors today Example from RBAC++: request.resourceAttributes.fieldSelector.requirements.exists(r, r.key == "type" && r.operator == "=" && sets.equivalent(r.values, ["mytype"])) How it would be written for admission: object.type == "mytype" The former is request-scoped/oriented, the second is object-scoped. The latter is clearly more user-friendly. @luxas.dev

Slide 45

Slide 45 text

Selector dimensionality Visual of two ORed allow rule conditions: (labels.env != “prod” && labels.owner in [“team-1”, “team-2”]) || (labels.env == “test”) There are 22 possible label selectors that would be allowed by these policies. How can we check that every object that could be returned from storage is authorized? @luxas.dev

Slide 46

Slide 46 text

Selector dimensionality The naive way would be to perform one check per object that could be matched. E.g. “owner in (‘team-1’, ‘team-2’), env in (‘test’, ‘dev’)” selectors match 4 “archetypes” of objects In this case, authorized! @luxas.dev

Slide 47

Slide 47 text

The naive way would be to perform one check per object that could be matched. E.g. “owner in (‘team-2’, ‘team-3’), env in (‘test’, ‘dev’)” selectors match 4 “archetypes” of objects In this case, not authorized! Selector dimensionality @luxas.dev Concrete counterexample:

Slide 48

Slide 48 text

However, explicit enumeration doesn’t work with NotExists, !=, NotIn @luxas.dev Because then the amount of possibly selected objects is infinite

Slide 49

Slide 49 text

Unify policy authoring targeting selectors for reads and writes Authorization Kubernetes Path for read request Project Full Evaluation Yes, No 403 Authentication Storage Example selectors: “Label owner in (‘team-1’, ‘team-2’), env in (‘test’, ‘dev’)” “Field .spec.gatewayClassName != ‘production’” @luxas.dev Policies RequestInfo UserInfo Selectors

Slide 50

Slide 50 text

Unify policy authoring targeting selectors for reads and writes Authorization RequestInfo UserInfo Selectors Kubernetes Path for read request Project 1. Partial Evaluation => Yes, No, Maybe 2. If Maybe, turn Selectors and Residual into SMT => Yield Yes or No 403 Policies Authentication Storage Authorize IFF: ∀o : objectSelected(o) ⇒ isAuthorized(o) ≡ ∃o : objectSelected(o) ∧ ¬isAuthorized(o) = UNSAT @luxas.dev

Slide 51

Slide 51 text

Example encoding isAuthorized(o) = (o.labels.env != “prod” && o.labels.owner in [“team-1”, “team-2”]) || (o.labels.env == “test”) objectSelected(o) = o.labels.env in [“test“, “dev”] && o.labels.owner in [“team-1”, “team-2”] ∀o: objectSelected(o) ⇒ isAuthorized(o) IFF ∃o: objectSelected(o) ∧ ¬isAuthorized(o) = UNSAT Result is UNSAT => Request authorized!

Slide 52

Slide 52 text

Implementation of this using Cedar @luxas.dev

Slide 53

Slide 53 text

Published 2024 @luxas.dev

Slide 54

Slide 54 text

Open Source Authorization Engine @luxas.dev

Slide 55

Slide 55 text

Open Source Authorization Engine @luxas.dev Aims to be expressive, fast, safe, and analyzable

Slide 56

Slide 56 text

Maintains a decidable encoding into Satisfiability Modulo Theories Open Source Authorization Engine @luxas.dev Aims to be expressive, fast, safe, and analyzable

Slide 57

Slide 57 text

Maintains a decidable encoding into Satisfiability Modulo Theories Open Source Authorization Engine @luxas.dev Aims to be expressive, fast, safe, and analyzable Supports RBAC, ReBAC and ABAC paradigms

Slide 58

Slide 58 text

Maintains a decidable encoding into Satisfiability Modulo Theories Open Source Authorization Engine @luxas.dev Aims to be expressive, fast, safe, and analyzable AWS is donating Cedar to the CNCF Supports RBAC, ReBAC and ABAC paradigms

Slide 59

Slide 59 text

Formal Verification Image Source: https://aws.amazon.com/blogs/opensource/lean-into-verified-software-development/ @luxas.dev

Slide 60

Slide 60 text

1. Improve policy authoring usability with typed schema @luxas.dev

Slide 61

Slide 61 text

Kubernetes API Server /openapi/v3/ API Discovery Document /apis// 1. Improve policy authoring usability with typed schema Project Schema IDE Dev loop @luxas.dev

Slide 62

Slide 62 text

2. Unify policy authoring for both authorization and admission Previous example shown in the project’s proposed syntax. Only one policy object is needed, not three like before. @luxas.dev

Slide 63

Slide 63 text

3. Unify policy authoring targeting selectors for reads and writes The last example, but for any action, including reads. Predicates targeting resource.stored determine if a concrete object is allowed to be read from storage. @luxas.dev

Slide 64

Slide 64 text

4. Analyze: Which policy is larger? old new @luxas.dev

Slide 65

Slide 65 text

Use CEL x Cedar intersection to allow both “frontends” CEL Cedar str.contains str.startsWith str.endsWith uint int bool string bytes* null map** list double has size*** () . [] {} - (unary) ! * / % + - (binary) == != < > <= >= in && || ?: Cedar is reducible to SMT in open source, thus AuthzCEL → Cedar → SMT

Slide 66

Slide 66 text

5. Write backend once, use for multiple “frontends” Kubernetes CEL (portion w/o loops) Kubernetes RBAC New Selector-based Authorization paradigm? New Multi-cluster Policies? Project SMT Solvers @luxas.dev Policies Engine

Slide 67

Slide 67 text

Takeaways I think (happy to be proven wrong at now rather than later) that conditional authorization is feasible to move forward upstream (to KEP) in a form like this. If we restrict ourselves to SMT-analyzable expressions, users can interface with either CEL or Cedar, or something else that they prefer. Privilege escalation analysis (policy ordering) might help us catch unexpected things like “I can edit an object such that it becomes readable for me” Cedar policies are nice to write in that they provide an instant IDE validation flow Users can be provided with a uniform experience across reads/writes and authorization/admission through this primitive Cedar could offload some of the complexities here, like analysis, if we want / need. @luxas.dev

Slide 68

Slide 68 text

Should I write up a KEP for this? @luxas.dev

Slide 69

Slide 69 text

Future Work / Ideas Is it worth integrating some of the parameter ideas of VAP into this, or does that get too complex? Sometimes, a semantic property is deeply nested in the object, and the conditional authorization layer (without loops) in unable to check it. Should we recommend API authors to in this case have some kind of flag/enum “top-level” on the object, which allows enabling the privileged behavior, and then enforce this in validation?

Slide 70

Slide 70 text

Ready-made answers to assumed questions This does NOT make authorization decisions dependent on object state. Authorization here is still dependent only on policies. This should work for API aggregation use-cases as well, even though the Kube API server doesn’t have access to the request body. We should still recommend that people design their APIs for specific personas, and not subdivide namespaces. We might want some way to enforce immutability once set for selectable fields and labels. Cedar released a CLI for policy analysis. Cedar is written in Rust, but provides FFI and Wasm bindings.

Slide 71

Slide 71 text

Thanks! Please give feedback! Email: [email protected] Bluesky: @luxas.dev LinkedIn: luxas CNCF/Kubernetes Slack: luxas Credits: Icons by Flaticon