Platform Engineering on Kubernetes 2025

Slide 1

Slide 1 text

Platform Engineering on Kubernetes The past, present and future Jan ‘25 - London UK

Slide 2

Slide 2 text

@Salaboy Mauricio Salatino htps:/ /salaboy.com

Slide 3

Slide 3 text

Platform Engineering on Kubernetes (Book) - Started in Feb 2020 - Published Oct 2023 by Manning - Practical approach - Focused on platforms backends, core mechanisms and APIs

Slide 4

Slide 4 text

Platform Engineering on Kubernetes - The early access program was announced in September 2021 under a different title

Slide 5

Slide 5 text

Book Chapters - Intro - CI/CD - GitOps - Building a Platform (provisioning a development environment) - Platform Capabilities - Release Strategies - Enabling Developers with feature ﬂags and APIs - Measuring platform initiatives with DORA

Slide 6

Slide 6 text

Book Repository: step-by-step tutorials https:/ /github.com/salaboy/platforms-on-k8s

Slide 7

Slide 7 text

The Past (MMXX-MMXXII) 2020 - 2022

Slide 8

Slide 8 text

Why do organizations adopt Kubernetes? (source IBM) 1. Container orchestration savings 2. Increased DevOps efﬁciency for microservices architecture 3. Deploying workloads in multicloud environments 4. More portability with less chance of vendor lock-in 5. Automation of deployment and scalability 6. App stability and availability in a cloud environment 7. Open-source beneﬁts of Kubernetes

Slide 9

Slide 9 text

But.. it gets really complicated really fast - To efﬁciently deliver applications, Kubernetes built-in mechanisms are not enough - It is expensive to train a whole organization to learn Kubernetes including DevOps, developers and operation teams - Different teams have different requirements, you need to extend Kubernetes to ﬁt their needs - Managing all these custom Kubernetes extension is a mess - Teams need to manage all these complexity

Slide 10

Slide 10 text

Typical Kubernetes adoption journey Kubernetes Maturity Model (very DevOpsy - 2021)

Slide 11

Slide 11 text

Kubernetes & cloud native adoption maturity model (technical) 1. Evaluate (aka PoCs) 2. Adopt (running production workloads) 3. Automate / Extend / Optimize your delivery pipelines 4. Design your platform capabilities

Slide 12

Slide 12 text

#1 Evaluate (aka PoCs) 1. Kubernetes basics -> Deployments, YAML, Tools 2. Observability (logs, metrics, traces) 3. Security (mTLS, secrets, service meshes) 4. Scalability In this stage: - Small teams playing around with Kubernetes and learning the basics - No company buy-in yet - Big learning curve, but there are tons of resources and companies that will help you today

Slide 13

Slide 13 text

#1 Evaluate (aka PoCs) (tools)

Slide 14

Slide 14 text

#2 Adopt (running production workloads) 1. Continuous integration and containerization challenges 2. Deployment and GitOps 3. Cloud resources provisioning and management (IaC) In this stage: - Multiple teams working together, the company is sold on Kubernetes - Mostly coordination challenges across teams doing things differently

Slide 15

Slide 15 text

#2 Adopt (running production workloads) Tools

Slide 16

Slide 16 text

#3 Automate/Extend/Optimize your delivery pipelines 1. Create new domain speciﬁc extensions (custom controllers and operators) 2. Bring third party solutions to speciﬁc problems (from the cloud native ecosystem) 3. Evaluate Kubernetes distributions / platforms such as Openshift 4. Start looking into multi-cluster challenges and management In this stage: - Focus on optimizing how the work is done and automation - Focus on the entire software delivery pipeline, not just deployments

Slide 17

Slide 17 text

#3 Automate/Extend/Optimize (tools)

Slide 18

Slide 18 text

(Personal journey) - I joined Diagrid Nov 2022 a startup created around the https://dapr.io project - Donated by Microsoft to CNCF in 2019 - Focused on enabling developers by abstracting complex infrastructure using APIs - There are not many projects with this focus in the CNCF

Slide 19

Slide 19 text

The Present (MMXXIII-MMXXV) 2023 - 2025

Slide 20

Slide 20 text

#4 Design your platform capabilities 1. Have a dedicated Platform team building glue for teams to consume a. Very opinionated in choosing tools, because they will need to maintain them for multiple teams 2. Platform teams start by automating topics in #1, #2 and #3, hiding the complexity of these decisions 3. Hiding low-level details 4. Consolidation of experiences (devexp) 5. Centralized access In this stage: - Platform focus on consumers, not on automation - Platforms are not just for developers, different initiatives will target different personas (devs, data scientists, SREs, Ops)

Slide 21

Slide 21 text

#4 Design your platform capabilities (tools)

Slide 22

Slide 22 text

Where are you in the adoption Journey? - Classify a customer based on which stage / phase they are as an organization - Classify platform team based on which stage the particular team is - They care about different aspects depending on the phase - #1 Evaluate: - how does this project/product help me to adopt Kubernetes faster? - #2 Adopt - How does this project/product ﬁts with the choices that I’ve made already (deployment mechanisms, service providers) - #3 Customize / Extend / Consume - What set of functionalities this product/project adds to my existing stack, who are the target users? How much complexity this adds to my overall solution - #4 Platform Initiatives - What are the current platform team priorities? How does this project/product maps with those priorities? - If they are a development team, priorities and phases completely different

Slide 23

Slide 23 text

Consolidation

Slide 24

Slide 24 text

Consolidation: BACK Stack (https:/ /backstack.dev/) - Public and opinionated combinations - Backstage + Argo CD + Crossplane + Kyverno - Showing how Kyverno ﬁts into the ecosystem

Slide 25

Slide 25 text

Consolidation: CNOE (cloud native operational excellence) https:/ /cnoe.io/ - More complex and Pluggable - Driven by AWS - EKS - Based on what their customers are using

Slide 26

Slide 26 text

Platform Engineering Maturity Model (link) 2023 (organizational) How mature are the Platform Engineering practices for a given organization?

Slide 27

Slide 27 text

Platform Engineering Capability Model (link) (organizational) - Microsoft take

Slide 28

Slide 28 text

Centralized access & Developer Experience

Slide 29

Slide 29 text

Developer Experience: Podman ecosystem - Red Hat’s alternative to Docker (OCI) - End-to-end from developers to Kubernetes - Integrated with developer tooling - Podman is donated to CNCF Nov 2024

Slide 30

Slide 30 text

Developers and Cloud Native (https:/ /tag-app-delivery.cncf.io/wgs/app-development/ )

Slide 31

Slide 31 text

The Future (MMXXV-MMXXX) 2025 - 2030

Slide 32

Slide 32 text

CNCF Platform Engineering Programs (link)

Slide 33

Slide 33 text

KubeCon EU London 2025 App Dev Track

Slide 34

Slide 34 text

But there is a lot of work to do around App Dev - When Platform Teams meet developers - Blog - KCD UK - with Abby Bangser https://www.youtube.com/watch?v=-uWcMxSEfrw - Giving developers a KIND Cluster doesn’t solve real problems - Meet developers where they are (don’t make developers learn new things) - Ops & DevOps & Platform Teams need to learn about the tools that developers are using - Developers inherit frankenstein experiences (using different CLIs, User Interfaces and tools that had been designed with different personas in mind) -

Slide 35

Slide 35 text

Developer Experience & developer communities - The more mature the platform initiatives are the closer we get to developers - A push around developer experience already started, but we need to be more concrete about what it means - As platform engineers - We need to get more involved with developer tooling - We need to get closer to developer communities - The state of developer experiences in 2025 whitepaper by the App Dev WG

Slide 36

Slide 36 text

KubeCon Hong Kong + Japan 👀 - Big opportunity for collaboration with other industries - Engaging new communities and possible business - Getting new feedback on initiatives that are usually starting in the US and Europe

Slide 37

Slide 37 text

- Kueue https://kueue.sigs.k8s.io/ - Distributed and multi cluster Job scheduling - This highlight the kind of challenges that we will be solving next - KCP (https://www.kcp.io/) is always there in the background (uniﬁed interface to talk to multiple clusters) - Drop by Red Hat - This folks were too early in the game - Gaining a second momentum now - Kubernetes Workspaces proposal by Apple The evolution of multi-cluster challenges

Slide 38

Slide 38 text

- Running AI workloads on Kubernetes is challenging, but things are moving forward - AI / LLM Gateways / APIs -> Envoy and Solo AI Gateways - AI Agents and frameworks - Expect more integration problems, the expensive kind AI-Enabled Platforms

Slide 39

Slide 39 text

What’s next (Salaboy’s wish list) - Companies behind open source projects joining forces to create stacks and solutions that play nice together - We managed to bring developers closer to platform engineers - This means that we attract developers to participate on what has been more infrastructure centric discussions - We help developers to communicate what they need from platforms and we ﬁnd the right communication channels to make this communication efﬁcient

Slide 40

Slide 40 text

Thanks! @salaboy / @daprdev / @diagridio

Slide 41

Slide 41 text

#1 Evaluation: How does Dapr ﬁts in this stage? - Topics that resonate at this point are Observability and Security - How can Dapr help teams at this stage to go faster without sounding too complex? (adding a new tool to their stack is always the entry point) - Smooth adoption journey for Kubernetes without adding new problems - Dapr Value Proposition: - Dapr helps to to make your applications observable, resilient and secure cross cloud (future prooﬁng)

Slide 42

Slide 42 text

#2 Adopt: How does Dapr fits in this stage? - Topics that resonate here are - Components, Configurations & Resiliency Policies abstract away infrastructure so applications can rely on a unified API to access cloud infrastructure for their applications - Which Kubernetes resources does Dapr provides and what problem do they solve? - They are probably doing modernization of infra - Operations at scale (SREs) Conductor makes a lot of sense.. Running Dapr at Scale - (IMPORTANT) they don’t care about Developers at this stage.. So we should focus on Resources here. - Dapr Value prop: - “Abstract Infrastructure from Application” (enabling cross cloud applications)

Slide 43

Slide 43 text

How does Dapr ﬁts in this stage? - Topics that resonate here are - Project Maturity - Case Studies with other companies adopting this project, showing that the project is production ready - There is a company providing support and product to run Dapr at scale (Conductor) - Dapr Value Prop: - Enable workloads to have less friction across environments (Access control ) - Zero trust workloads

Slide 44

Slide 44 text

How does Dapr fits in this stage? - Topics that resonate here are - Ecosystem player, how do we integrate with other solutions - Integrates with with Cloud Providers - Can we tap into the automation process they are have already in place? How which mechanisms do we provide? - How do we augment existing golden paths? - What’s the value proposition: - For existing apps (brownfield apps) - For new Apps (greenfield) - For new initiatives like extending existing apps with AI - Dapr Value Prop: - Enable developer teams go faster (design patterns, best practices, solutions to distributed application challenges) - Enable operations to quickly troubleshoot apps + infra