BSc student at Aalto University, Finland CNCF Ambassador, Certified Kubernetes Administrator and Emeritus Kubernetes WG/SIG Lead KubeCon Speaker in Berlin, Austin, Copenhagen, Shanghai, Seattle, San Diego & Valencia KubeCon Keynote Speaker in Barcelona Former Kubernetes approver and subproject owner, active in the OSS community for 7+ years. Worked on e.g. SIG Cluster Lifecycle => kubeadm to GA. Weaveworks contractor, Weave Ignite & libgitops author Cloud Native Nordics co-founder & meetup organizer Guild of Automation and Systems Technology corporate relations & CFO
dream true JSON container workload specification REST API server HTTP POST JSON object *The process doesn’t look exactly like this, it is a simplified mental model for now
dream true JSON container workload specification REST API server HTTP POST JSON object Container Workload Controller read desired state *The process doesn’t look exactly like this, it is a simplified mental model for now
dream true JSON container workload specification REST API server HTTP POST JSON object Container Workload Controller read desired state *The process doesn’t look exactly like this, it is a simplified mental model for now pull start re-start monitor
events, load spikes, machine failures, hardware upgrades, and large-scale partial failures (e.g., a power supply bus duct)” Source: (Verma et. al., 2015) Google Finding: “Failure is the Norm”
1 2 3 Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out OS v1 Config A Power On OS v1 Config A Power On OS v1 Config A Power On 1 2 3 Example: Sysadmin A gets three new servers, and install the same operating system onto all of them, with exactly the same configuration. In the beginning, the system is completely ordered, all instances are identically configured.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out OS v1 Config A Power On OS v2 Config A Power On OS v2 Config A Power On 1 2 3 After some time, a critical “v2” security upgrade to the operating system becomes available, and sysadmin A upgrades servers 2 and 3, but not 1, as it is running a critical database service, so A is afraid to disturb it.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out Slow disk access time OS v1 Config B Power On OS v2 Config A Power On OS v2 Config A Power On 1 2 3 Server 1 complains about slow disk access time, due to a misconfiguration in the operating system. Sysadmin A fixes it imperatively on the computer that complains until it stops, but none of the other servers.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out OS v1 Config B Power On OS v2 Config A Power Off OS v2 Config A Power On 1 2 3 Sysadmin A has noticed that the amount of users has dropped because of a seasonal trend, so A decides to turn server 2 off to save on energy costs.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out Slow disk access time OS v1 Config B Power On OS v2 Config A Power Off OS v2 Config C Power On 1 2 3 The next week, when sysadmin A is on vacation, server 3 complains about the same error as server 1 earlier. Sysadmin B “solves” the issue (in another way than A for server 1), but does nothing to the other servers.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out OS v1 Config B Power On OS v2 Config A Power Off OS v3 Config C Power On 1 2 3 desired state change Now, a new version of the operating system is released with a very cool feature that would be useful to the sysadmins. However, upgrading is risky because of incompatibilities, so they only upgrade server 3 to try it out.
Operating System v1 v2 v3 1 2 3 Configuration A B C 1 2 3 Power On Off Out OS v1 Config B Power On OS v2 Config A Power Off OS v3 Config C Power Out 1 2 3 emergent state change Suddenly, a thunderstorm enters the area where the servers are, and the lightning strikes. Due to the lack of overvoltage protection, server 3’s power supply becomes unusable, and thus shuts down.
inevitably becoming less ordered, and thus b) need some periodic corrective action to steer the course towards c) some declared desired state of the system.
Web UI Half of the replicas go down :( Imperative flow: is home / at sleep Nothing happens; need either admin to wake up or to wait for next morning Area of uncertainty grows!
Web UI System operating in good condition Declarative flow: Desired State Store This design philosophy is why e.g. Kubernetes is called “self-healing”. is home / at sleep
Native is all about pluggable APIs forming consistent abstractions that projects can implement and/or rely on. These CNCF/LF projects contain only a specification, no implementation:
the claim(s) Observe and diff Act Desired State Source 3 Report (Actual State Sink) Target System 2 1 2, 6: Actual State 1: Desired State 4: Action 3: Action Plan 5: Result 4 5 (6)
the claim(s) Observe and diff Act Desired State Source 3 Report (Actual State Sink) Target System 2 1 7: Requeue 2, 6: Actual State 1: Desired State 4: Action 3: Action Plan 5: Result 4 5 (6) 7
APIs: Endpoints, Services, Ingress & Gateway - Registers its custom Network APIs with Kubernetes for advanced features - “Compiles” routing and eBPF rules for you on the fly, based on the desired state you specified in the cluster => you never have to write detailed rules - Encodes human-like operational knowledge about configuring networks into a reusable tool controlled by declarative APIs
more details! Available openly on Github: https://github.com/luxas/research CC-BY-SA 4.0 licensed Encoding human-like operational knowledge using declarative Kubernetes operator patterns
“Control through choreography” based on experience 2. Periodic controller action for fighting inevitable chaos 3. Declarativeness allows defining a (portable) end goal
“Control through choreography” based on experience 2. Periodic controller action for fighting inevitable chaos 3. Declarativeness allows defining a (portable) end goal 4. control loops + extensible declarative APIs = operators