Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Platform Engineering on Kubernetes 2025

Salaboy
January 31, 2025

Platform Engineering on Kubernetes 2025

for more information visit: https://www.salaboy.com

Salaboy

January 31, 2025
Tweet

More Decks by Salaboy

Other Decks in Technology

Transcript

  1. Platform Engineering on Kubernetes (Book) - Started in Feb 2020

    - Published Oct 2023 by Manning - Practical approach - Focused on platforms backends, core mechanisms and APIs
  2. Platform Engineering on Kubernetes - The early access program was

    announced in September 2021 under a different title
  3. Book Chapters - Intro - CI/CD - GitOps - Building

    a Platform (provisioning a development environment) - Platform Capabilities - Release Strategies - Enabling Developers with feature flags and APIs - Measuring platform initiatives with DORA
  4. Why do organizations adopt Kubernetes? (source IBM) 1. Container orchestration

    savings 2. Increased DevOps efficiency for microservices architecture 3. Deploying workloads in multicloud environments 4. More portability with less chance of vendor lock-in 5. Automation of deployment and scalability 6. App stability and availability in a cloud environment 7. Open-source benefits of Kubernetes
  5. But.. it gets really complicated really fast - To efficiently

    deliver applications, Kubernetes built-in mechanisms are not enough - It is expensive to train a whole organization to learn Kubernetes including DevOps, developers and operation teams - Different teams have different requirements, you need to extend Kubernetes to fit their needs - Managing all these custom Kubernetes extension is a mess - Teams need to manage all these complexity
  6. Kubernetes & cloud native adoption maturity model (technical) 1. Evaluate

    (aka PoCs) 2. Adopt (running production workloads) 3. Automate / Extend / Optimize your delivery pipelines 4. Design your platform capabilities
  7. #1 Evaluate (aka PoCs) 1. Kubernetes basics -> Deployments, YAML,

    Tools 2. Observability (logs, metrics, traces) 3. Security (mTLS, secrets, service meshes) 4. Scalability In this stage: - Small teams playing around with Kubernetes and learning the basics - No company buy-in yet - Big learning curve, but there are tons of resources and companies that will help you today
  8. #2 Adopt (running production workloads) 1. Continuous integration and containerization

    challenges 2. Deployment and GitOps 3. Cloud resources provisioning and management (IaC) In this stage: - Multiple teams working together, the company is sold on Kubernetes - Mostly coordination challenges across teams doing things differently
  9. #3 Automate/Extend/Optimize your delivery pipelines 1. Create new domain specific

    extensions (custom controllers and operators) 2. Bring third party solutions to specific problems (from the cloud native ecosystem) 3. Evaluate Kubernetes distributions / platforms such as Openshift 4. Start looking into multi-cluster challenges and management In this stage: - Focus on optimizing how the work is done and automation - Focus on the entire software delivery pipeline, not just deployments
  10. (Personal journey) - I joined Diagrid Nov 2022 a startup

    created around the https://dapr.io project - Donated by Microsoft to CNCF in 2019 - Focused on enabling developers by abstracting complex infrastructure using APIs - There are not many projects with this focus in the CNCF
  11. #4 Design your platform capabilities 1. Have a dedicated Platform

    team building glue for teams to consume a. Very opinionated in choosing tools, because they will need to maintain them for multiple teams 2. Platform teams start by automating topics in #1, #2 and #3, hiding the complexity of these decisions 3. Hiding low-level details 4. Consolidation of experiences (devexp) 5. Centralized access In this stage: - Platform focus on consumers, not on automation - Platforms are not just for developers, different initiatives will target different personas (devs, data scientists, SREs, Ops)
  12. Where are you in the adoption Journey? - Classify a

    customer based on which stage / phase they are as an organization - Classify platform team based on which stage the particular team is - They care about different aspects depending on the phase - #1 Evaluate: - how does this project/product help me to adopt Kubernetes faster? - #2 Adopt - How does this project/product fits with the choices that I’ve made already (deployment mechanisms, service providers) - #3 Customize / Extend / Consume - What set of functionalities this product/project adds to my existing stack, who are the target users? How much complexity this adds to my overall solution - #4 Platform Initiatives - What are the current platform team priorities? How does this project/product maps with those priorities? - If they are a development team, priorities and phases completely different
  13. Consolidation: BACK Stack (https:/ /backstack.dev/) - Public and opinionated combinations

    - Backstage + Argo CD + Crossplane + Kyverno - Showing how Kyverno fits into the ecosystem
  14. Consolidation: CNOE (cloud native operational excellence) https:/ /cnoe.io/ - More

    complex and Pluggable - Driven by AWS - EKS - Based on what their customers are using
  15. Platform Engineering Maturity Model (link) 2023 (organizational) How mature are

    the Platform Engineering practices for a given organization?
  16. Developer Experience: Podman ecosystem - Red Hat’s alternative to Docker

    (OCI) - End-to-end from developers to Kubernetes - Integrated with developer tooling - Podman is donated to CNCF Nov 2024
  17. But there is a lot of work to do around

    App Dev - When Platform Teams meet developers - Blog - KCD UK - with Abby Bangser https://www.youtube.com/watch?v=-uWcMxSEfrw - Giving developers a KIND Cluster doesn’t solve real problems - Meet developers where they are (don’t make developers learn new things) - Ops & DevOps & Platform Teams need to learn about the tools that developers are using - Developers inherit frankenstein experiences (using different CLIs, User Interfaces and tools that had been designed with different personas in mind) -
  18. Developer Experience & developer communities - The more mature the

    platform initiatives are the closer we get to developers - A push around developer experience already started, but we need to be more concrete about what it means - As platform engineers - We need to get more involved with developer tooling - We need to get closer to developer communities - The state of developer experiences in 2025 whitepaper by the App Dev WG
  19. KubeCon Hong Kong + Japan 👀 - Big opportunity for

    collaboration with other industries - Engaging new communities and possible business - Getting new feedback on initiatives that are usually starting in the US and Europe
  20. - Kueue https://kueue.sigs.k8s.io/ - Distributed and multi cluster Job scheduling

    - This highlight the kind of challenges that we will be solving next - KCP (https://www.kcp.io/) is always there in the background (unified interface to talk to multiple clusters) - Drop by Red Hat - This folks were too early in the game - Gaining a second momentum now - Kubernetes Workspaces proposal by Apple The evolution of multi-cluster challenges
  21. - Running AI workloads on Kubernetes is challenging, but things

    are moving forward - AI / LLM Gateways / APIs -> Envoy and Solo AI Gateways - AI Agents and frameworks - Expect more integration problems, the expensive kind AI-Enabled Platforms
  22. What’s next (Salaboy’s wish list) - Companies behind open source

    projects joining forces to create stacks and solutions that play nice together - We managed to bring developers closer to platform engineers - This means that we attract developers to participate on what has been more infrastructure centric discussions - We help developers to communicate what they need from platforms and we find the right communication channels to make this communication efficient
  23. #1 Evaluation: How does Dapr fits in this stage? -

    Topics that resonate at this point are Observability and Security - How can Dapr help teams at this stage to go faster without sounding too complex? (adding a new tool to their stack is always the entry point) - Smooth adoption journey for Kubernetes without adding new problems - Dapr Value Proposition: - Dapr helps to to make your applications observable, resilient and secure cross cloud (future proofing)
  24. #2 Adopt: How does Dapr fits in this stage? -

    Topics that resonate here are - Components, Configurations & Resiliency Policies abstract away infrastructure so applications can rely on a unified API to access cloud infrastructure for their applications - Which Kubernetes resources does Dapr provides and what problem do they solve? - They are probably doing modernization of infra - Operations at scale (SREs) Conductor makes a lot of sense.. Running Dapr at Scale - (IMPORTANT) they don’t care about Developers at this stage.. So we should focus on Resources here. - Dapr Value prop: - “Abstract Infrastructure from Application” (enabling cross cloud applications)
  25. How does Dapr fits in this stage? - Topics that

    resonate here are - Project Maturity - Case Studies with other companies adopting this project, showing that the project is production ready - There is a company providing support and product to run Dapr at scale (Conductor) - Dapr Value Prop: - Enable workloads to have less friction across environments (Access control ) - Zero trust workloads
  26. How does Dapr fits in this stage? - Topics that

    resonate here are - Ecosystem player, how do we integrate with other solutions - Integrates with with Cloud Providers - Can we tap into the automation process they are have already in place? How which mechanisms do we provide? - How do we augment existing golden paths? - What’s the value proposition: - For existing apps (brownfield apps) - For new Apps (greenfield) - For new initiatives like extending existing apps with AI - Dapr Value Prop: - Enable developer teams go faster (design patterns, best practices, solutions to distributed application challenges) - Enable operations to quickly troubleshoot apps + infra