Slide 1

Slide 1 text

Exploring modern and secure operations of Kubernetes clusters on the Edge Lucas Käldström - CNCF Ambassador 9th of June, 2021 - Open Data Science Conference (Online) Image credit: @ashleymcnamara

Slide 2

Slide 2 text

@kubernetesonarm $ whoami Lucas Käldström, 2nd-year student at Aalto, 21 yo CNCF Ambassador, Certified Kubernetes Administrator and former Kubernetes WG/SIG Lead KubeCon Speaker in Berlin, Austin, Copenhagen, Shanghai, Seattle, Barcelona (Keynote) & San Diego Former Kubernetes approver and subproject owner, active in the community for 5+ years. Worked on e.g. SIG Cluster Lifecycle => kubeadm to GA. Weaveworks contractor since 2017 Weave Ignite and Racklet co-author Cloud Native Nordics co-founder & meetup organizer

Slide 3

Slide 3 text

@kubernetesonarm Warning: This talk does not talk about MLOps directly, it tells you what to look into to be able to do MLOps on the edge well

Slide 4

Slide 4 text

@kubernetesonarm 1. Secure the boot process Part 1: Edge Security

Slide 5

Slide 5 text

@kubernetesonarm What’s complex with booting a machine? 1. Correctness and openness of the boot binaries a. If the binary is proprietary, how can you be sure it doesn’t have a backdoor or CVE? 2. Duplication of important drivers across bootloaders a. Most bootloaders are written in C, and implement e.g. a UDP driver by themselves 3. Introspection of the boot flow a. Ensure that the right (read: not malicious) binaries were used, in the right order 4. Remote verification of an edge node being safe a. How can a remote actor ensure that a remote node is in a well-known state?

Slide 6

Slide 6 text

@kubernetesonarm High-level boot steps on embedded devices 1. On-chip bootloader in ROM, unchangeable a. This code is fixed at manufacture-time, and often has very limited functionality 2. Hardware initialization bootloader step a. Loaded from e.g. SPI Flash, EEPROM, or SD Card by 1). b. Often subject to tight size constraints in the CPU cache/SRAM, e.g. 512 kB. c. Initializes (D)RAM. May run the “ARM stub”, a short piece of hardware init code. d. Open-source projects include u-boot SPL and Coreboot. 3. Raw executable to load into RAM a. Can be Linux or some other “bare-metal” executable like TianoCore EDK2 or u-boot b. Loaded into the beginning of RAM and executed by 2)

Slide 7

Slide 7 text

@kubernetesonarm Open Source Bootloaders Coreboot is a bootloader popular with x86 devices, but also supports ARM and other platforms Oreboot is like Coreboot, but all C is removed for Rust u-boot is the most popular bootloader for ARM SBCs Join the Open Source Firmware movement! ===> https://slack.osfw.dev/

Slide 8

Slide 8 text

@kubernetesonarm On-chip ROM EEPROM bootloader < 512kB TFTP SD Card USB SSD start.elf bootloader ~2MB config.txt Device Tree ARM stub RAM initialized CPUs set-up “Boot files” BCM GPU/SoC CPU *kernel8.img Example: Raspberry Pi 4 Closed Source Open Source kernel8.img Complex custom logic What now? * Can be Linux, TianoCore EDK2 and/or u-boot

Slide 9

Slide 9 text

@kubernetesonarm On-chip ROM SPI SD Card USB **u-boot SPL < 1MB*** Boot Script Device Tree ARM stub / HW init RAM initialized CPUs set-up “Boot files” SoC CPU *kernel8.img Ideal Scenario? Closed Source Open Source kernel8.img “De facto” standard, common codebase * Can be Linux, TianoCore EDK2 and/or u-boot ** Or Coreboot, Oreboot or UEFI PEI *** Approximation. Board-specific

Slide 10

Slide 10 text

@kubernetesonarm Hey! We’re now missing out on netbooting?!?

Slide 11

Slide 11 text

@kubernetesonarm Yay, now addressed problem 1: Correctness and openness of the boot binaries

Slide 12

Slide 12 text

@kubernetesonarm Trusted Framework-A (TF-A) + OP-TEE Similar to Intel SGX*; partitions the CPU into a “secure” and “non-secure” part. OP-TEE is the “secure world” OSS impl. (Has Rust and Go libraries ready) TF-A is packaged as the “ARM stub” that is run right before the main executable. * However, it seems like TF-A implements a subset of SGX’ features

Slide 13

Slide 13 text

@kubernetesonarm On-chip ROM SPI SD Card USB u-boot SPL < 1MB Boot Script Device Tree TF-A RAM initialized CPUs set-up “Boot files” SoC CPU kernel8.img Add “secure world” with TF-A Closed Source Open Source kernel8.img “De facto” standard, common codebase “Non-secure” “Secure” OP-TEE impl

Slide 14

Slide 14 text

@kubernetesonarm 1. Secure the boot process 2. Securely net-booting the OS Part 1: Edge Security

Slide 15

Slide 15 text

@kubernetesonarm LinuxBoot Idea: Replace UEFI or a “high-level” bootloader with a minimal Linux for better security, reproducibility and transparency. => Any Python developer can now be a firmware developer kexec: Change the running kernel without rebooting Only compile in what you need

Slide 16

Slide 16 text

@kubernetesonarm Compare “traditional” UEFI vs LinuxBoot Hardware Initialization Hardware Initialization LinuxBoot way Fetch OS to boot Fetch OS to boot Run desired OS Run desired OS Traditional UEFI UEFI PEI

Slide 17

Slide 17 text

@kubernetesonarm u-root The first/only process running in your LinuxBoot, packaged inside an initramfs in the kernel. Written in Go, OSS on GitHub. Include any external binary, or write your own boot logic in Go. Provides a set of common boot logic written in a memory-safe language, e.g. provides a kexec method

Slide 18

Slide 18 text

@kubernetesonarm Add in LinuxBoot, not u-boot or TianoCore On-chip ROM SPI SD Card USB u-boot SPL < 1MB Boot Script Device Tree TF-A RAM initialized CPUs set-up “Boot files” SoC CPU µLinux Closed Source Open Source *µLinux <16MB “Non-secure” “Secure” OP-TEE impl u-root kexec Linux

Slide 19

Slide 19 text

@kubernetesonarm The Update Framework (TUF) “A framework for securing software update systems” A set of steps to follow to ensure software updates or artifacts are not (at least easily) compromised by attackers. Used in e.g. “docker pull/push” (Notary) and Bottlerocket. Graduated CNCF project.

Slide 20

Slide 20 text

@kubernetesonarm Open Containers Initiative The de-facto container format. Donated to the Linux Foundation by Docker 2015. Now industry-standard. Provides specifications for a container image (packaging), runtime (isolation) and registry (distribution). Evolving into a generic artifact distribution mechanism with e.g. Deis’ ORAS.

Slide 21

Slide 21 text

@kubernetesonarm LinuxBoot + u-root + OCI/ORAS = ociboot* Idea: Want netbooting and a “modern” way to build the OS. Read: Build your OS image using a Dockerfile. OCI provides image packaging/distribution. With LinuxBoot + u-root one can write “firmware” in Go that downloads OCI images and kexec’s. ORAS allows any image format (e.g. qcow2) instead of OCI, but still uses the OCI distribution part. * This project still in idea stage, not implemented yet

Slide 22

Slide 22 text

@kubernetesonarm ociboot workflow OCI pull Optional Pull Secrets OCI Registry LinuxBoot Extract OS image RAM Disk kexec Target OS ociboot Benefits: 1. All complex “netboot” logic written in Go 2. Re-use all device drivers from Linux 3. Works regardless of compute type 4. Notary used behind the scenes for verification

Slide 23

Slide 23 text

@kubernetesonarm LinuxBoot + u-root + TUF = tufboot* ociboot would use Notary under the hood for OCI pull However, Notary is only one impl. of TUF. TUF is flexible for many more (advanced) trust & delegation flows. tufboot would download any supported artifact securely given a TUF Root of Trust JSON spec. ociboot & tufboot can be integrated into u-root or webboot. * This project still in idea stage, not implemented yet

Slide 24

Slide 24 text

@kubernetesonarm tufboot workflow TUF download TUF Root of Trust JSON S3 bucket / HTTP(S) server LinuxBoot Extract OS image RAM Disk kexec Target OS tufboot Benefits: 1. All complex “netboot” logic written in Go 2. Re-use all device drivers from Linux 3. Works regardless of compute type 4. Advanced trust delegations possible

Slide 25

Slide 25 text

@kubernetesonarm Yay, now addressed problem 2: Duplication of important drivers across bootloaders

Slide 26

Slide 26 text

@kubernetesonarm 1. Secure the boot process 2. Securely net-booting the OS 3. Remote Attestation Part 1: Edge Security

Slide 27

Slide 27 text

@kubernetesonarm Remote Attestation Problem: How can I trust that a machine (e.g. on the edge) booted correctly and wasn’t tampered with? In case the “cloud” part needs to give it high-privilege credentials for accessing sensitive resources/data, it should be confident that machine is safe first. Remote Attestation is a way to solve this problem.

Slide 28

Slide 28 text

@kubernetesonarm Trusted Platform Module (TPM) “a dedicated microcontroller designed to secure hardware through integrated cryptographic keys.“ Can generate keys, sign/verify/encrypt/decrypt data, and store Platform Configuration Registers (PCR) PCRs can only be extended, not set: pcr[i] = hash(pcr[i] || extendArg)

Slide 29

Slide 29 text

@kubernetesonarm Trusted Platform Module (TPM) PCRs form a good way to seal, not just encrypt data. Sealing means “encrypt with both a key and the PCRs” In a conventional Static Root of Trust for Measurements (SRTM) flow, the PCR register is extended with the hash of the next boot executable before execution. => Only if all boot binaries are correct, unsealing can happen

Slide 30

Slide 30 text

@kubernetesonarm Yay, now addressed problem 3: Introspection of the boot flow

Slide 31

Slide 31 text

@kubernetesonarm Remote Attestation Image from Simma, Armin. (2015). Trusting Your Cloud Provider. Protecting Private Virtual Machines. Nonce (random number) prevents replay attacks. Nonce + PCRs => Quote Quote is validated by server Server can send secret back encrypted by the TPM key

Slide 32

Slide 32 text

@kubernetesonarm SRTM On-chip ROM u-boot SPL CPUs set-up SoC CPU LinuxBoot Closed Source Open Source kexec Target Linux TPM PCR 0 1. Extend with u-boot SPL hash 2. Extend with LinuxBoot hash 1 2 3 3. Extend with Target Linux hash App Attestation Server Random Nonce 4. Client ask for nonce and PCR list 4 5. Get signed PCR quote from TPM 5 EK Reference Quote = Sent Quote 6 6. Validate sent quote matches reference App Secret 7. Encrypt App Secret with EK 7 8 8. Decrypt App Secret with EK

Slide 33

Slide 33 text

@kubernetesonarm Yay, now addressed problem 4: Remote verification of an edge node being safe

Slide 34

Slide 34 text

@kubernetesonarm Good resources on TPMs Remote Attestation helper scripts, sealing keys using PCRs, and more https://safeboot.dev/ StackExchange answer on SRTM vs DRTM google/go-attestation

Slide 35

Slide 35 text

@kubernetesonarm 4. Kubernetes Cluster Lifecycle Part 2: Edge Automation

Slide 36

Slide 36 text

@kubernetesonarm What’s complex with managing a cluster? 5. Reinventing the wheel for Kubernetes mgmt a. Don’t write it all yourself. Use upstream, community-backed building blocks 6. Interoperability between many different providers a. Tasks like creating, upgrading and autoscaling clusters can be wildly different 7. Declaratively controlling a fleet of edge clusters a. Making sure a set of edge clusters stay in sync at all times is a challenge 8. Keeping ingested edge data in sync with the cloud a. How to deal with network interruptions of edge clusters without app modification

Slide 37

Slide 37 text

@kubernetesonarm kubeadm Control Plane 1 Control Plane N Node 1 Node N kubeadm kubeadm kubeadm kubeadm Cloud Provider Load Balancers Monitoring Logging Cluster API Spec Cluster API Cluster API Implementation Addons Kubernetes API Bootstrapping Machines Infrastructure = The official tool to bootstrap a minimum viable, best-practice Kubernetes cluster Layer 2 kubeadm Layer 3 Addon Operators Layer 1 Cluster API

Slide 38

Slide 38 text

@kubernetesonarm end-to-end solution Control Plane 1 Control Plane N Node 1 Node N kubeadm kubeadm kubeadm kubeadm Cloud Provider Load Balancers Monitoring Logging Cluster API Spec Cluster API Cluster API Implementation Addons Kubernetes API Bootstrapping Machines Infrastructure kubeadm vs an “end-to-end solution” kubeadm is built to be part of a higher-level solution

Slide 39

Slide 39 text

@kubernetesonarm k3s A kubeadm-like deployment mechanism where all Kubernetes components are integrated into one binary. That combined with removal of some set of features not needed at the edge means small binary footprint. CNCF sandbox project. More opinionated than kubeadm which means easier to get going with, but less extensibility.

Slide 40

Slide 40 text

@kubernetesonarm Yay, now addressed problem 5: Reinventing the wheel for Kubernetes mgmt

Slide 41

Slide 41 text

@kubernetesonarm Cluster API The next step after kubeadm “To make the management of (X) clusters across (Y) providers simple, secure, and configurable.” “How can I manage any number of clusters in a similar fashion to how I manage deployments in Kubernetes?”

Slide 42

Slide 42 text

@kubernetesonarm Declarative clusters apiVersion: cluster.x-k8s.io/v1alpha4 kind: MachineDeployment metadata: name: "test-cluster-md-0" spec: clusterName: "test-cluster" replicas: 3 template: spec: clusterName: "test-cluster" version: v1.20.1 bootstrap: configRef: name: "test-cluster-md-0" apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4 kind: EKSConfigTemplate infrastructureRef: name: "test-cluster-md-0" apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 kind: AWSMachineTemplate --- apiVersion: infrastructure.cluster.x-k8s.io/v1alpha4 kind: AWSMachineTemplate metadata: name: "test-cluster-md-0" spec: template: spec: instanceType: "standard-4vcpu-8gb" iamInstanceProfile: "test-iam-profile" sshKeyName: "my-personal-ssh-key" ● With Kubernetes we manage our applications declaratively a. Why not for the cluster itself? ● With the Cluster API, we can declaratively define the desired cluster state a. Operator implementations reconcile the state b. Use Spec & Status like the rest of k8s c. Common management solutions for e.g. upgrades, autoscaling and repair d. Allows for GitOps workflows

Slide 43

Slide 43 text

@kubernetesonarm Yay, now addressed problem 6: Interoperability between many different providers

Slide 44

Slide 44 text

@kubernetesonarm 4. Kubernetes Cluster Lifecycle 5. Automate the edge with GitOps Part 2: Edge Automation

Slide 45

Slide 45 text

@kubernetesonarm GitOps: A cloud-native paradigm GitOps, coined by Alexis Richardson, CEO of Weaveworks Idea: Declaratively describe the desired state of all your infrastructure in a versioned backend like Git, and have controllers execute towards that state. Observe-diff-act Allows for better reproducibility, drift detection, mean-time-to-recovery, control, and more.

Slide 46

Slide 46 text

@kubernetesonarm Source: GitOps Today and Tomorrow: Conceptual Overview and Technical Deep Dive – Cornelia Davis

Slide 47

Slide 47 text

@kubernetesonarm Flux: The GitOps Engine The original GitOps implementation. Syncs desired state from Git, Helm charts or a S3 bucket to a Kubernetes cluster. Extensible and contains many advanced features. CNCF incubating project, has a large community. Large integration ecosystem (e.g. Cluster API, OPA)

Slide 48

Slide 48 text

@kubernetesonarm kspan: Visualization of Kubernetes, GitOps kspan listens to Kubernetes Events, and turns those into OpenTelemetry (CNCF sandbox project) spans. This plays well with GitOps, as you can watch the lifecycle of your Kubernetes state being synced by Flux Jaeger (CNCF graduated project) is a good visualization frontend for OpenTelemetry.

Slide 49

Slide 49 text

@kubernetesonarm kspan: Visualization of Kubernetes, GitOps Source: https://github.com/weaveworks-experiments/kspan

Slide 50

Slide 50 text

@kubernetesonarm Yay, now addressed problem 7: Declaratively controlling a fleet of edge clusters

Slide 51

Slide 51 text

@kubernetesonarm 4. Kubernetes Cluster Lifecycle 5. Automate the edge with GitOps 6. Sync the edge and the cloud Part 2: Edge Automation

Slide 52

Slide 52 text

@kubernetesonarm KubeEdge “an open source system for extending native containerized application orchestration capabilities to hosts at Edge” Incubating CNCF project. Allows for discovery and data ingestion of MQTT-compliant devices and local HTTP APIs. The edge part is configured by the cloud and syncs ingested edge data to the cloud whenever possible. Another alternative to consider would be AKRI.

Slide 53

Slide 53 text

@kubernetesonarm KubeEdge Collect MQTT data Control what is run on the edge Connect to HTTP apps Run “normal” containers on the Edge, e.g. AI inference tasks Cache data and keep in sync when connectivity is bad

Slide 54

Slide 54 text

@kubernetesonarm Impressive KubeEdge use-case

Slide 55

Slide 55 text

@kubernetesonarm Yay, now addressed problem 8: Keeping ingested edge data in sync with the cloud

Slide 56

Slide 56 text

@kubernetesonarm What now? Implementation left as an exercise Yes, and no. I’m working on Racklet, libgitops + more this summer to put all of these things together. Watch this announcement ==> If you’re interested, join the OSFW Slack at https://slack.osfw.dev/ If interested in GitOps, email me at [email protected] :)

Slide 57

Slide 57 text

Thank you! @luxas on Github @luxas on Kubernetes’ Slack @kubernetesonarm on Twitter [email protected] / [email protected]

Slide 58

Slide 58 text

Appendix Slides not included in the talk, but that are still relevant context

Slide 59

Slide 59 text

@kubernetesonarm Example: Raspberry Pi 4 1. On-chip bootloader that gets bootloader from EEPROM a. Cannot be modified in any way. Runs on the SoC (GPU). 2. EEPROM -> start.elf SoC/GPU bootloaders a. The bootloader in the EEPROM can do e.g. SD Card, TFTP, USB, SSD, and NVMe booting to get start.elf, and auxiliary files => complex custom piece of code b. start.elf can be thought of as the BIOS of the RPi, it’s configured through config.txt c. Both files are proprietary firmware files for the SoC/GPU. Initializes the RAM. 3. Raw ARM64 binary to load into RAM a. Before that, an armstub is run that sets up the CPUs. b. Most commonly loads Linux, but can also boot TianoCore EDK2 and/or u-boot

Slide 60

Slide 60 text

@kubernetesonarm Problems with Raspberry Pi boot 1. Proprietary EEPROM and start.elf bootloaders - Cannot know if the content is legitimate as it is not OSS 2. GPU has full access to RAM => can bypass CPU - Isolation features, exception levels, etc. have sadly no effect 3. No support for TPMs - Cannot do a “trusted boot chain” where the next step is measured before execution 4. EEPROM cannot easily be write-protected - A malicious user can gain persistence in the EEPROM

Slide 61

Slide 61 text

@kubernetesonarm ARM UEFI compliance levels ARM bootloader ecosystem is fragmented => push to standardize. EBBR: For embedded devices, a lighter variant of SBBR. More or less what u-boot accomplishes. SBBR: For an “out-of-the-box” experience when booting various OSes. Requires UEFI+ACPI. Work in progress to provide a SBBR-compliant RPi 4 UEFI

Slide 62

Slide 62 text

@kubernetesonarm TF-A architecture

Slide 63

Slide 63 text

@kubernetesonarm Nodes Control Plane Kubernetes’ high-level component architecture Node 3 OS Container Runtime Kubelet Networking Node 2 OS Container Runtime Kubelet Networking Node 1 OS Container Runtime Kubelet Networking API Server (REST API) Controller Manager (Controller Loops) Scheduler (Bind Pod to Node) etcd (key-value DB, SSOT) User Legend: CNI CRI OCI Protobuf gRPC JSON

Slide 64

Slide 64 text

@kubernetesonarm Cluster API The next step after kubeadm “How do I manage other lifecycle events across that infrastructure (upgrades, deletions, etc.)?” “How can we control all of this via a consistent API across providers?”

Slide 65

Slide 65 text

@kubernetesonarm Source: GitOps Today and Tomorrow: Conceptual Overview and Technical Deep Dive – Cornelia Davis

Slide 66

Slide 66 text

@kubernetesonarm libgitops: Read/write objects in files in Git easily “An ORM, a library written in Go, for Kubernetes-style API objects, stored in pluggable backends, most famously Git” Flux: “compile” the desired declarative spec (DDS) to compiled declarative state (CDS), e.g. through API server into etcd. Flagger/ctrl-runtime: Act and reconcile actual state (CDS). libgitops controller: Reconcile actual state (CDS) to into a new desired spec (DDS). Flux controller-runtime Flagger libgitops Source: GitOps Today and Tomorrow: Conceptual Overview and Technical Deep Dive – Cornelia Davis DDS CDS

Slide 67

Slide 67 text

@kubernetesonarm 1. Interface-driven encoders, decoders, versioners and recognizers in the system 2. Abstract Storage mechanism, target can be anywhere 3. Any Object can be managed by the system due to 1) 4. Generic Transaction model and engine built-in. 5. Lets user build e.g. GUIs libgitops enabling the “Future of GitOps” Vision