Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scaling AI Applications with Kubernetes

Scaling AI Applications with Kubernetes

Talk at Google Developers Group Lahore on September 14 2025

Avatar for Sheharyar Naseer

Sheharyar Naseer

September 14, 2025
Tweet

More Decks by Sheharyar Naseer

Other Decks in Programming

Transcript

  1. Background ✦ Indie nomad so ft ware architect ✦ 15+

    years of polyglot experience, focus on Web & Cloud ✦ Worked with Apple, Superlist, AllocatorOne, TheScore, Slab, etc. ✦ StackOver fl ow: 75,000+ score (Top 5 in Pakistan) ✦ Author / Contributor of multiple famous libraries & tools ✦ Featured on popular developer communities
  2. The AI Explosion ✦ Unprecedented growth and use of AI

    ✦ 72% of businesses have adopted AI [Source] ✦ 700M+ people used AI apps in fi rst-half 0f 2025 [Source] ✦ Market size to reach $1.34 Trillion USD by 2030 [Source] ✦ Massive investments in startups and infrastructure ✦ Systems need to scale to handle demand
  3. AI Spectrum ✦ External API calls (Gemini, Claude, OpenAI) ✦

    Self-hosted or Custom ✦ LLMs ✦ RAG Systems (Vector, Embedding, and other services) ✦ Vision Inference (Detection, Segmentation, etc.) ✦ Traditional ML (Regression, Classi fi cation, Clustering, Anomalies, etc.) ✦ Generative models ✦ Hybrid Systems
  4. LLM Dominance ✦ Highest visibility ✦ Mainstream adoption ✦ Platform

    e ff ect ✦ All the hype ✦ Versatile applications ✦ Even used where it shouldn't be
  5. Growing Pains ✦ Scaling systems is already hard ✦ New

    sets of challenges per type ✦ Vary greatly by scope ✦ No single strategy that can be applied everywhere
  6. Examples ✦ Common ✦ Resource-bound operations ✦ Over/under-provisioned infrastructure ✦

    Cost management ✦ External APIs (Rate-limits) ✦ LLMs (Cold-starts, Scheduling, Fragmentation) ✦ RAGs (Data consistency, Pipeline coordination, Storage) ✦ Hybrid Systems (Coordination + all of the above)
  7. What is K8S? ✦ Open-source container automation framework ✦ Simple,

    modular and declarative ✦ Automated: Monitors, detects & recti fi es issues ✦ Primitives ✦ Pods, Services, Volumes, Operators, Scheduling, etc. ✦ Focus ✦ External API AI applications
  8. Scaling Strategies ✦ Vertical Pod Autoscaler (VPA) ✦ Adjusts resources

    within pods ✦ Horizontal Pod Autoscaler (HPA) ✦ Scale number of pods up/down ✦ K8S Event-Driven Autoscaler (KEDA) ✦ Scale number of pods up/down
  9. Vertical Pod Autoscaler ✦ Adjusts CPU & memory for pods

    ✦ Recommender: ✦ Calculates optimal resource values by analyzing historical metrics ✦ Sets two parameters: Target utilization & min/max resources allowed ✦ Updater: ✦ Monitors recommendation changes ✦ Replaces pods when adjustments are needed
  10. Horizontal Pod Autoscaler ✦ Adjusts number of pods replicas ✦

    Based on workload ✦ Resource utilization (CPU/memory) ✦ Tra ffi c-Driven ✦ Scales both up and down ✦ Most common & widely used pattern
  11. K8S Event-Driven Autoscaler ✦ Supports both Horizontal & Vertical scaling

    ✦ Based on custom events & metrics ✦ Ideal for: ✦ Event-driven architectures (message queues, event buses) ✦ Serverless-style applications (scale to zero) ✦ Workloads determined by data availability
  12. More Tools & Platforms ✦ We're only scratched the surface

    ✦ Core Tools ✦ MLFlow ✦ Seldon Core ✦ KServe ✦ Runtimes ✦ Nvidia Triton ✦ TensorFlow Serving