Scaling AI Applications with Kubernetes

Embed

Start on current slide

Slide 1

Slide 1 text

Google Developers Group Lahore ▪︎ September 2025 SCALING AI APPLICATIONS WITH KUBERNETES

Slide 2

Slide 2 text

Outline Systems architect & technology advisor for startups and enterprises. Sheharyar Naseer Find me online @sheharyarn

Slide 3

Slide 3 text

Background ✦ Indie nomad so ft ware architect ✦ 15+ years of polyglot experience, focus on Web & Cloud ✦ Worked with Apple, Superlist, AllocatorOne, TheScore, Slab, etc. ✦ StackOver fl ow: 75,000+ score (Top 5 in Pakistan) ✦ Author / Contributor of multiple famous libraries & tools ✦ Featured on popular developer communities

Slide 4

Slide 4 text

Outline ✦ State of AI ✦ Scaling Challenges ✦ Using Kubernetes ✦ Q/A

Slide 5

Slide 5 text

State of AI What does the AI landscape look like today?

Slide 6

Slide 6 text

The AI Explosion ✦ Unprecedented growth and use of AI ✦ 72% of businesses have adopted AI [Source] ✦ 700M+ people used AI apps in fi rst-half 0f 2025 [Source] ✦ Market size to reach $1.34 Trillion USD by 2030 [Source] ✦ Massive investments in startups and infrastructure ✦ Systems need to scale to handle demand

Slide 7

Slide 7 text

AI Spectrum ✦ External API calls (Gemini, Claude, OpenAI) ✦ Self-hosted or Custom ✦ LLMs ✦ RAG Systems (Vector, Embedding, and other services) ✦ Vision Inference (Detection, Segmentation, etc.) ✦ Traditional ML (Regression, Classi fi cation, Clustering, Anomalies, etc.) ✦ Generative models ✦ Hybrid Systems

Slide 8

Slide 8 text

LLM Dominance ✦ Highest visibility ✦ Mainstream adoption ✦ Platform e ff ect ✦ All the hype ✦ Versatile applications ✦ Even used where it shouldn't be

Slide 9

Slide 9 text

Scaling Challenges Common problems when scaling AI applications

Slide 10

Slide 10 text

Growing Pains ✦ Scaling systems is already hard ✦ New sets of challenges per type ✦ Vary greatly by scope ✦ No single strategy that can be applied everywhere

Slide 11

Slide 11 text

Examples ✦ Common ✦ Resource-bound operations ✦ Over/under-provisioned infrastructure ✦ Cost management ✦ External APIs (Rate-limits) ✦ LLMs (Cold-starts, Scheduling, Fragmentation) ✦ RAGs (Data consistency, Pipeline coordination, Storage) ✦ Hybrid Systems (Coordination + all of the above)

Slide 12

Slide 12 text

Kubernetes for Scale How can Kubernetes help us scale AI applications

Slide 13

Slide 13 text

What is K8S? ✦ Open-source container automation framework ✦ Simple, modular and declarative ✦ Automated: Monitors, detects & recti fi es issues ✦ Primitives ✦ Pods, Services, Volumes, Operators, Scheduling, etc. ✦ Focus ✦ External API AI applications

Slide 14

Slide 14 text

Scaling Strategies ✦ Vertical Pod Autoscaler (VPA) ✦ Adjusts resources within pods ✦ Horizontal Pod Autoscaler (HPA) ✦ Scale number of pods up/down ✦ K8S Event-Driven Autoscaler (KEDA) ✦ Scale number of pods up/down

Slide 15

Slide 15 text

Vertical Pod Autoscaler ✦ Adjusts CPU & memory for pods ✦ Recommender: ✦ Calculates optimal resource values by analyzing historical metrics ✦ Sets two parameters: Target utilization & min/max resources allowed ✦ Updater: ✦ Monitors recommendation changes ✦ Replaces pods when adjustments are needed

Slide 16

Slide 16 text

Horizontal Pod Autoscaler ✦ Adjusts number of pods replicas ✦ Based on workload ✦ Resource utilization (CPU/memory) ✦ Tra ffi c-Driven ✦ Scales both up and down ✦ Most common & widely used pattern

Slide 17

Slide 17 text

K8S Event-Driven Autoscaler ✦ Supports both Horizontal & Vertical scaling ✦ Based on custom events & metrics ✦ Ideal for: ✦ Event-driven architectures (message queues, event buses) ✦ Serverless-style applications (scale to zero) ✦ Workloads determined by data availability

Slide 18

Slide 18 text

More Tools & Platforms ✦ We're only scratched the surface ✦ Core Tools ✦ MLFlow ✦ Seldon Core ✦ KServe ✦ Runtimes ✦ Nvidia Triton ✦ TensorFlow Serving

Slide 19

Slide 19 text

Questions? These Slides → shyr.io/t/scaling-ai-with-k8s More Talks → shyr.io/talks shyr.io [email protected] @sheharyarn  @ 