Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Make developers fly: Principles for platform en...

Make developers fly: Principles for platform engineering

This talk is about how to properly do platform engineering and how these principles apply to AI platforms.

The talk can be seen here: https://www.youtube.com/live/tkR1RT1NY5M?t=2735

Many thanks to https://speakerdeck.com/alex0ptr.

Alexander Eimer

August 12, 2024
Tweet

More Decks by Alexander Eimer

Other Decks in Technology

Transcript

  1. 5 QAware DevOps: Wall of Confusion Developer Operator Key tasks:

    • Able to react quickly to market changes and develop new features. • Success is often measured by the frequency of deliveries. Key tasks: • Stable, secure and reliable services for customers. • Success is often measured by the reliability of the system. Consequences: • Opposing goals lead to conflict, mistrust and ultimately to the creation of silos. • Software is “thrown over the fence”, without consideration to operational feasibility or operational aspects. • Operations complicates deliveries through bureaucratic processes in order to maintain control. • In the worst case this results in frequent downtimes, poor response times and stagnation of the value chain. This threatens all business areas.
  2. 6 QAware DevOps: Definition “DevOps describes a process improvement method

    from the software development and systems administration area. [...] DevOps enables more effective and efficient collaboration between the Dev, Ops and Quality Assurance (QA) departments through shared incentives, processes and software tools. DevOps improves the quality of the software, the speed of development and delivery as well as the cooperation between the teams involved.” Wikipedia
  3. 9 QAware More than just kubectl apply -f • Security

    • Compliance • Integration • Reliability • Scalability • KRITIS, GDPR • Cost Efficiency • AuthX • Maintenance
  4. 11 QAware Platform Engineering • Specialisation of the roles, to

    reduce cognitive load • Still DevOps, central interface: the platform • Re-use and organisational scaling • Automated integration means more software engineering “Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide an integrated product most often referred to as an “Internal Developer Platform” covering the operational necessities of the entire lifecycle of an application.” Humanitec
  5. 13 QAware Internal Developer Platforms - zoomed in no IDPs:

    pure Compute Platforms • Corporate requirements and services need to be integrated • e.g. GitLab, AuthX, Processes… Source: Amazon Web Services
  6. AI operational challenges • Avoid uncontrolled growth, just like it

    happened with cloud-native • Cost control is easier • Capsule complexity where possible • Increase development speed • Make sure compliant solutions are created • Build a central knowledge hub to avoid pitfalls again and again • Enable easy access to data-lakes at a single point 15 QAware
  7. AI platform levels Differentiate between • Low Level ◦ APIs

    (LLM, Embedding, VectorDB) • Medium Level ◦ API for RAG with UI components ◦ Test Framework • High Level ◦ ChatBot UI for each employee ◦ No-code UI solutions 16 QAware
  8. Possible components • Manage quota & scale of AI/LLM •

    Testing & Evaluation framework ◦ eg. RAGAs-aaS • GPT-aaS (router function and version mgmt) • RAG-aaS • Embedding-aaS • Internal Chat GPT with corporate-internal knowledge • Guardrails 17 QAware
  9. Yes, please! …but how? AI Platform Core Company applications or

    digital products on-premise or cloud-native Platform AI Services with standardized interfaces, instantiated for a specific domain Directly usable AI chat with internal data Integration of the security solutions Integration of the source-systems and data-lakes
  10. Yes, please! …but how? Technical terms Large Language Model Data

    sources Prompt Template Memory APIs / Tools Guardrails Agent Control Flow
  11. 21 V e r s i o n e d

    D e c e n t r a l i z e d U s e r - c e n t e r e d C u s t o m i z a b l e - T r a n s p a r e n t S e l f - s e r v i c e
  12. 23 QAware IDPs versioned like software • Versioned, with tags,

    Release Notes • Releases controlled by pipelines • E2E test on every version • Automated delivery (Patch, Pipeline, Test) # run from IDP template repository # create a patch file git diff v41..v42 > /tmp/v42.patch # run in concrete instance repository # test if patch is applicable in instance git apply --check v42.patch # apply changes git apply /tmp/v42.patch git commit -am "IDP upgrade v41 → v42" git push
  13. 25 QAware Central Multi-Tenant Platform Scalability e.g. Prometheus, OpenSearch, GitOps

    Isolation e.g. Docker Multi-Tenancy e.g. RBAC, Grafana Stack Coordination e.g. K8s deprecations, CRDs Single Point of Failure e.g. API Gateway Route
  14. Developer UX ▪ User Guide ▪ Subtemplates, Modules, Blueprints for

    golden paths 29 base-chart-spring: name: my-deployment version: '1-snapshot_a5d5547f_13561_master' springProfiles: - name: k8s content: | my-deployment: business: refresh-interval: PT5m api-key: ksyajdf4038dsse envSecrets: SPRING_DATASOURCE_URL: secretName: postgres-my-deployment key: jdbc allowConnectionsFrom: - nginx-ingress - my-other-deployment module "postgresql_..." { source = "git::https://.../.../modules/postgresql.git?ref=1.0.4" resource_group = azurerm_resource_group.this kube_outbound_ip = module.aks.lb_public_ip_outbound sku_name = local.config.postgres_sku_name subnet_id = module.vnet.subnet_id kube_namespace = "default" tags = local.standard_tags }
  15. Developer UX ▪ User Guide ▪ Subtemplates, Modules, Blueprints for

    golden paths ▪ Scaffolding for typical Use-Cases 30
  16. Developer UX ▪ User Guide ▪ Subtemplates, Modules, Blueprints for

    golden paths ▪ Scaffolding for typical Use-Cases ▪ Tools for Observability, Debugging… 31
  17. Developer UX ▪ User Guide ▪ Subtemplates, Modules, Blueprints for

    golden paths ▪ Scaffolding for typical Use-Cases ▪ Tools for Observability, Debugging… ▪ Support ▪ Fully integrated 32
  18. 34 QAware Trail mix • switching off compliance enforcement is

    a central feature • should be finely granular • control adjustments to the reference e.g. via CODEOWNERS and MR • defined docking interfaces e.g. trigger Token und Webhooks apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sDenyLoadbalancerService metadata: name: deny-loadbalancer-service spec: match: kinds: - apiGroups: [""] kinds: ["Service"] parameters: allowedLoadbalancers : - 'traefik/traefik' /CODEOWNERS @platform-team /01-infra/ @platform-team /02-user/ @user-team-foo
  19. 37 QAware “Platforms reduce cognitive load by exposing useful abstractions.

    Good abstractions form a cohesive language and useful mental model. Omitting relevant details is tempting but ends up with dangerous illusions.” Gregor Hohpe @ PlatformCon 2023 Autor von Cloud Strategy Tasks for Developer Platforms: • Build understandable abstractions with escape hatches • Understand the limitations of your own abstractions (e.g. Build vs Runtime) ... • ... and consider them for DevEx (Debugging, Alerting) • Cloud Services offer ready-made abstractions
  20. 38 QAware Inner Source • All code is open internally

    • Each instance of an IDP is open • Reference IDP is open ◦ Issue Tracker ◦ Roadmap ◦ PRs welcome • Community Events New Features, exchange of ideas…
  21. 40 QAware Self-Service The life cycle of an IDP is

    under the full control of the user and generally requires no interaction from the platform team. • Creating, deleting, upgrading an IDP instance is initiated by users • Tools: CLI, UIs, Pipelines • Automated processes monitor and enforce compliance and quality • Few PEs are needed to operation a large number of IDPs
  22. 44 (Platform Engineers) User-centered ✅ Self-service ✅ Decentralized ✅ Versioned

    ✅ Customizable ✅ AWS Proton: Developer Platform as a Service Transparent ✅
  23. 48 QAware Capability Tool/Method k8s/CNCF Tool/Method AWS Provisioning Engine Terraform,

    ArgoCD, Kubernetes Operators AWS CloudFormation CI/CD GitLab CI, Argo Workflows AWS CodePipeline Source Code GitLab CI AWS CodeCommit Pattern Repository Git Repository AWS Proton, AWS Service Catalog Managed Services Cloud Services AWS services, AWS Private Marketplace Developer Portal Backstage, GitLab Pages AWS Proton, AWS Service Catalog CLI Code AWS CLI (Proton Commands) Deployment Service Code, Crossplane AWS Proton Managed Environments Code / Git AWS Proton, AWS Control Tower Governance Open Policy Agent, AWS Config AWS Control Tower, AWS Config, AWS SecurityHub, Amazon GuardDuty, Amazon Inspector Mapping capabilities with implementations CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/
  24. 49 QAware Capability Tool/Methode k8s/CNCF Tool/Methode AWS Provisioning Engine Terraform,

    ArgoCD, Kubernetes Operators AWS CloudFormation CI/CD GitLab CI, Argo Workflows AWS CodePipeline Source Code GitLab CI AWS CodeCommit Pattern Repository Git Repository AWS Proton, AWS Service Catalog Managed Services Cloud Services AWS services, AWS Private Marketplace Developer Portal Backstage, GitLab Pages AWS Proton, AWS Service Catalog CLI Code AWS CLI (Proton Commands) Deployment Service Code, Crossplane AWS Proton Managed Environments Code / Git AWS Proton, AWS Control Tower Governance Open Policy Agent, AWS Config AWS Control Tower, AWS Config, AWS SecurityHub, Amazon GuardDuty, Amazon Inspector Mapping capabilities with implementations CNCF Platforms White Paper: https://tag-app-delivery.cncf.io/whitepapers/platforms/ tl;dr Choose your ecosystem
  25. 󰚦 🚀 50 50 KUDOS to him for the idea

    and his presentation. Alex Krause Software Architect, QAware passionate about scalable platform engineering in conjunction with cloud-native microservices 🐦 @alex0ptr
  26. 💪 😎 51 51 Former Product Owner at Hallo Magenta

    co-ideation for this talk cool dude 😎 Robert Hoffmann Solutions Architect @awscloud formerly: • @DeutscheTelekom • @Samsung I move boxes around to help people move boxes around. 🐦 @robhoffmax
  27. qaware.de QAware GmbH Mainz Rheinstraße 4 C 55116 Mainz Tel.

    +49 6131 21569-0 [email protected] twitter.com/qaware linkedin.com/company/qaware-gmbh xing.com/companies/qawaregmbh slideshare.net/qaware github.com/qaware (DE)
  28. Abstract How do we help our developers to fly instead

    of crashing miserably? The answer is Platform Engineering, a discipline for building internal developer platforms (IDPs) to simplify software delivery for product teams. In this talk, you'll learn how Platform Engineering evolved from the DevOps movement and what principles and best practices make for a good implementation. After that, we'll take a look at reference architectures that can support your platform. Beside that, we will discuss the usage of AI platforms and how it helps to accelerate LLMs and RAG company-wide. 54 QAware