Slide 1

Slide 1 text

qaware.de Make developers fly: Principles for platform engineering Alexander Eimer Senior Software Engineer, QAware ⚠ Contains AI ⚠

Slide 2

Slide 2 text

Who is a platform engineer? → GP / AI

Slide 3

Slide 3 text

What’s happened so far…

Slide 4

Slide 4 text

4 QAware DevOps: Wall of Confusion Developer Operator Value chain

Slide 5

Slide 5 text

5 QAware DevOps: Wall of Confusion Developer Operator Key tasks: ● Able to react quickly to market changes and develop new features. ● Success is often measured by the frequency of deliveries. Key tasks: ● Stable, secure and reliable services for customers. ● Success is often measured by the reliability of the system. Consequences: ● Opposing goals lead to conflict, mistrust and ultimately to the creation of silos. ● Software is “thrown over the fence”, without consideration to operational feasibility or operational aspects. ● Operations complicates deliveries through bureaucratic processes in order to maintain control. ● In the worst case this results in frequent downtimes, poor response times and stagnation of the value chain. This threatens all business areas.

Slide 6

Slide 6 text

6 QAware DevOps: Definition “DevOps describes a process improvement method from the software development and systems administration area. [...] DevOps enables more effective and efficient collaboration between the Dev, Ops and Quality Assurance (QA) departments through shared incentives, processes and software tools. DevOps improves the quality of the software, the speed of development and delivery as well as the cooperation between the teams involved.” Wikipedia

Slide 7

Slide 7 text

7 “You build it, you run it.” Werner Vogles - 2006 VP and CTO at Amazon.com

Slide 8

Slide 8 text

8 QAware Source: Amazon Web Services

Slide 9

Slide 9 text

9 QAware More than just kubectl apply -f ● Security ● Compliance ● Integration ● Reliability ● Scalability ● KRITIS, GDPR ● Cost Efficiency ● AuthX ● Maintenance

Slide 10

Slide 10 text

Platform Engineering

Slide 11

Slide 11 text

11 QAware Platform Engineering ● Specialisation of the roles, to reduce cognitive load ● Still DevOps, central interface: the platform ● Re-use and organisational scaling ● Automated integration means more software engineering “Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide an integrated product most often referred to as an “Internal Developer Platform” covering the operational necessities of the entire lifecycle of an application.” Humanitec

Slide 12

Slide 12 text

12 QAware Developer Platform App Platform Engineer Developer

Slide 13

Slide 13 text

13 QAware Internal Developer Platforms - zoomed in no IDPs: pure Compute Platforms ● Corporate requirements and services need to be integrated ● e.g. GitLab, AuthX, Processes… Source: Amazon Web Services

Slide 14

Slide 14 text

Platform Engineering for AI

Slide 15

Slide 15 text

AI operational challenges ● Avoid uncontrolled growth, just like it happened with cloud-native ● Cost control is easier ● Capsule complexity where possible ● Increase development speed ● Make sure compliant solutions are created ● Build a central knowledge hub to avoid pitfalls again and again ● Enable easy access to data-lakes at a single point 15 QAware

Slide 16

Slide 16 text

AI platform levels Differentiate between ● Low Level ○ APIs (LLM, Embedding, VectorDB) ● Medium Level ○ API for RAG with UI components ○ Test Framework ● High Level ○ ChatBot UI for each employee ○ No-code UI solutions 16 QAware

Slide 17

Slide 17 text

Possible components ● Manage quota & scale of AI/LLM ● Testing & Evaluation framework ○ eg. RAGAs-aaS ● GPT-aaS (router function and version mgmt) ● RAG-aaS ● Embedding-aaS ● Internal Chat GPT with corporate-internal knowledge ● Guardrails 17 QAware

Slide 18

Slide 18 text

Yes, please! …but how? AI Platform Core Company applications or digital products on-premise or cloud-native Platform AI Services with standardized interfaces, instantiated for a specific domain Directly usable AI chat with internal data Integration of the security solutions Integration of the source-systems and data-lakes

Slide 19

Slide 19 text

Yes, please! …but how? Technical terms Large Language Model Data sources Prompt Template Memory APIs / Tools Guardrails Agent Control Flow

Slide 20

Slide 20 text

Principles and Patterns

Slide 21

Slide 21 text

21 V e r s i o n e d D e c e n t r a l i z e d U s e r - c e n t e r e d C u s t o m i z a b l e - T r a n s p a r e n t S e l f - s e r v i c e

Slide 22

Slide 22 text

Versioned Infrastructure like software

Slide 23

Slide 23 text

23 QAware IDPs versioned like software ● Versioned, with tags, Release Notes ● Releases controlled by pipelines ● E2E test on every version ● Automated delivery (Patch, Pipeline, Test) # run from IDP template repository # create a patch file git diff v41..v42 > /tmp/v42.patch # run in concrete instance repository # test if patch is applicable in instance git apply --check v42.patch # apply changes git apply /tmp/v42.patch git commit -am "IDP upgrade v41 → v42" git push

Slide 24

Slide 24 text

Decentralised Shared risks and fast iterations

Slide 25

Slide 25 text

25 QAware Central Multi-Tenant Platform Scalability e.g. Prometheus, OpenSearch, GitOps Isolation e.g. Docker Multi-Tenancy e.g. RBAC, Grafana Stack Coordination e.g. K8s deprecations, CRDs Single Point of Failure e.g. API Gateway Route

Slide 26

Slide 26 text

26 QAware Source: Amazon Web Services

Slide 27

Slide 27 text

User-centered Product Mindset: Users First!

Slide 28

Slide 28 text

Developer UX ■ User Guide 28

Slide 29

Slide 29 text

Developer UX ■ User Guide ■ Subtemplates, Modules, Blueprints for golden paths 29 base-chart-spring: name: my-deployment version: '1-snapshot_a5d5547f_13561_master' springProfiles: - name: k8s content: | my-deployment: business: refresh-interval: PT5m api-key: ksyajdf4038dsse envSecrets: SPRING_DATASOURCE_URL: secretName: postgres-my-deployment key: jdbc allowConnectionsFrom: - nginx-ingress - my-other-deployment module "postgresql_..." { source = "git::https://.../.../modules/postgresql.git?ref=1.0.4" resource_group = azurerm_resource_group.this kube_outbound_ip = module.aks.lb_public_ip_outbound sku_name = local.config.postgres_sku_name subnet_id = module.vnet.subnet_id kube_namespace = "default" tags = local.standard_tags }

Slide 30

Slide 30 text

Developer UX ■ User Guide ■ Subtemplates, Modules, Blueprints for golden paths ■ Scaffolding for typical Use-Cases 30

Slide 31

Slide 31 text

Developer UX ■ User Guide ■ Subtemplates, Modules, Blueprints for golden paths ■ Scaffolding for typical Use-Cases ■ Tools for Observability, Debugging… 31

Slide 32

Slide 32 text

Developer UX ■ User Guide ■ Subtemplates, Modules, Blueprints for golden paths ■ Scaffolding for typical Use-Cases ■ Tools for Observability, Debugging… ■ Support ■ Fully integrated 32

Slide 33

Slide 33 text

Customisable Trail mix for the silver path.

Slide 34

Slide 34 text

34 QAware Trail mix ● switching off compliance enforcement is a central feature ● should be finely granular ● control adjustments to the reference e.g. via CODEOWNERS and MR ● defined docking interfaces e.g. trigger Token und Webhooks apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sDenyLoadbalancerService metadata: name: deny-loadbalancer-service spec: match: kinds: - apiGroups: [""] kinds: ["Service"] parameters: allowedLoadbalancers : - 'traefik/traefik' /CODEOWNERS @platform-team /01-infra/ @platform-team /02-user/ @user-team-foo

Slide 35

Slide 35 text

35 QAware Batteries included, but changeable. Conclusio

Slide 36

Slide 36 text

Transparent is the new abstracted

Slide 37

Slide 37 text

37 QAware “Platforms reduce cognitive load by exposing useful abstractions. Good abstractions form a cohesive language and useful mental model. Omitting relevant details is tempting but ends up with dangerous illusions.” Gregor Hohpe @ PlatformCon 2023 Autor von Cloud Strategy Tasks for Developer Platforms: ● Build understandable abstractions with escape hatches ● Understand the limitations of your own abstractions (e.g. Build vs Runtime) ... ● ... and consider them for DevEx (Debugging, Alerting) ● Cloud Services offer ready-made abstractions

Slide 38

Slide 38 text

38 QAware Inner Source ● All code is open internally ● Each instance of an IDP is open ● Reference IDP is open ○ Issue Tracker ○ Roadmap ○ PRs welcome ● Community Events New Features, exchange of ideas…

Slide 39

Slide 39 text

Self-Service Skiing without the danger of avalanches

Slide 40

Slide 40 text

40 QAware Self-Service The life cycle of an IDP is under the full control of the user and generally requires no interaction from the platform team. ● Creating, deleting, upgrading an IDP instance is initiated by users ● Tools: CLI, UIs, Pipelines ● Automated processes monitor and enforce compliance and quality ● Few PEs are needed to operation a large number of IDPs

Slide 41

Slide 41 text

Building blocks

Slide 42

Slide 42 text

42 QAware Building blocks (Capabilities)

Slide 43

Slide 43 text

43 QAware Example: AWS-native with AWS Proton

Slide 44

Slide 44 text

44 (Platform Engineers) User-centered ✅ Self-service ✅ Decentralized ✅ Versioned ✅ Customizable ✅ AWS Proton: Developer Platform as a Service Transparent ✅

Slide 45

Slide 45 text

45 QAware

Slide 46

Slide 46 text

46 QAware

Slide 47

Slide 47 text

47 QAware CNCF/K8s Orchestrator for IDP CNCF Components Git + +

Slide 48

Slide 48 text

48 QAware Capability Tool/Method k8s/CNCF Tool/Method AWS Provisioning Engine Terraform, ArgoCD, Kubernetes Operators AWS CloudFormation CI/CD GitLab CI, Argo Workflows AWS CodePipeline Source Code GitLab CI AWS CodeCommit Pattern Repository Git Repository AWS Proton, AWS Service Catalog Managed Services Cloud Services AWS services, AWS Private Marketplace Developer Portal Backstage, GitLab Pages AWS Proton, AWS Service Catalog CLI Code AWS CLI (Proton Commands) Deployment Service Code, Crossplane AWS Proton Managed Environments Code / Git AWS Proton, AWS Control Tower Governance Open Policy Agent, AWS Config AWS Control Tower, AWS Config, AWS SecurityHub, Amazon GuardDuty, Amazon Inspector Mapping capabilities with implementations CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/

Slide 49

Slide 49 text

49 QAware Capability Tool/Methode k8s/CNCF Tool/Methode AWS Provisioning Engine Terraform, ArgoCD, Kubernetes Operators AWS CloudFormation CI/CD GitLab CI, Argo Workflows AWS CodePipeline Source Code GitLab CI AWS CodeCommit Pattern Repository Git Repository AWS Proton, AWS Service Catalog Managed Services Cloud Services AWS services, AWS Private Marketplace Developer Portal Backstage, GitLab Pages AWS Proton, AWS Service Catalog CLI Code AWS CLI (Proton Commands) Deployment Service Code, Crossplane AWS Proton Managed Environments Code / Git AWS Proton, AWS Control Tower Governance Open Policy Agent, AWS Config AWS Control Tower, AWS Config, AWS SecurityHub, Amazon GuardDuty, Amazon Inspector Mapping capabilities with implementations CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/ tl;dr Choose your ecosystem

Slide 50

Slide 50 text

󰚦 🚀 50 50 KUDOS to him for the idea and his presentation. Alex Krause Software Architect, QAware passionate about scalable platform engineering in conjunction with cloud-native microservices 🐦 @alex0ptr

Slide 51

Slide 51 text

💪 😎 51 51 Former Product Owner at Hallo Magenta co-ideation for this talk cool dude 😎 Robert Hoffmann Solutions Architect @awscloud formerly: ● @DeutscheTelekom ● @Samsung I move boxes around to help people move boxes around. 🐦 @robhoffmax

Slide 52

Slide 52 text

Q & A

Slide 53

Slide 53 text

qaware.de QAware GmbH Mainz Rheinstraße 4 C 55116 Mainz Tel. +49 6131 21569-0 [email protected] twitter.com/qaware linkedin.com/company/qaware-gmbh xing.com/companies/qawaregmbh slideshare.net/qaware github.com/qaware (DE)

Slide 54

Slide 54 text

Abstract How do we help our developers to fly instead of crashing miserably? The answer is Platform Engineering, a discipline for building internal developer platforms (IDPs) to simplify software delivery for product teams. In this talk, you'll learn how Platform Engineering evolved from the DevOps movement and what principles and best practices make for a good implementation. After that, we'll take a look at reference architectures that can support your platform. Beside that, we will discuss the usage of AI platforms and how it helps to accelerate LLMs and RAG company-wide. 54 QAware