Scaling AI Applications with Kubernetes

Google Developers Group Lahore ▪︎ September 2025 SCALING AI APPLICATIONS
WITH KUBERNETES

Outline Systems architect & technology advisor for startups and enterprises.
Sheharyar Naseer Find me online @sheharyarn

Background ✦ Indie nomad so ft ware architect ✦ 15+
years of polyglot experience, focus on Web & Cloud ✦ Worked with Apple, Superlist, AllocatorOne, TheScore, Slab, etc. ✦ StackOver fl ow: 75,000+ score (Top 5 in Pakistan) ✦ Author / Contributor of multiple famous libraries & tools ✦ Featured on popular developer communities

Outline ✦ State of AI ✦ Scaling Challenges ✦ Using
Kubernetes ✦ Q/A

State of AI What does the AI landscape look like
today?

The AI Explosion ✦ Unprecedented growth and use of AI
✦ 72% of businesses have adopted AI [Source] ✦ 700M+ people used AI apps in fi rst-half 0f 2025 [Source] ✦ Market size to reach $1.34 Trillion USD by 2030 [Source] ✦ Massive investments in startups and infrastructure ✦ Systems need to scale to handle demand

AI Spectrum ✦ External API calls (Gemini, Claude, OpenAI) ✦
Self-hosted or Custom ✦ LLMs ✦ RAG Systems (Vector, Embedding, and other services) ✦ Vision Inference (Detection, Segmentation, etc.) ✦ Traditional ML (Regression, Classi fi cation, Clustering, Anomalies, etc.) ✦ Generative models ✦ Hybrid Systems

LLM Dominance ✦ Highest visibility ✦ Mainstream adoption ✦ Platform
e ff ect ✦ All the hype ✦ Versatile applications ✦ Even used where it shouldn't be

Scaling Challenges Common problems when scaling AI applications

Growing Pains ✦ Scaling systems is already hard ✦ New
sets of challenges per type ✦ Vary greatly by scope ✦ No single strategy that can be applied everywhere

Examples ✦ Common ✦ Resource-bound operations ✦ Over/under-provisioned infrastructure ✦
Cost management ✦ External APIs (Rate-limits) ✦ LLMs (Cold-starts, Scheduling, Fragmentation) ✦ RAGs (Data consistency, Pipeline coordination, Storage) ✦ Hybrid Systems (Coordination + all of the above)

Kubernetes for Scale How can Kubernetes help us scale AI
applications

What is K8S? ✦ Open-source container automation framework ✦ Simple,
modular and declarative ✦ Automated: Monitors, detects & recti fi es issues ✦ Primitives ✦ Pods, Services, Volumes, Operators, Scheduling, etc. ✦ Focus ✦ External API AI applications

Scaling Strategies ✦ Vertical Pod Autoscaler (VPA) ✦ Adjusts resources
within pods ✦ Horizontal Pod Autoscaler (HPA) ✦ Scale number of pods up/down ✦ K8S Event-Driven Autoscaler (KEDA) ✦ Scale number of pods up/down

Vertical Pod Autoscaler ✦ Adjusts CPU & memory for pods
✦ Recommender: ✦ Calculates optimal resource values by analyzing historical metrics ✦ Sets two parameters: Target utilization & min/max resources allowed ✦ Updater: ✦ Monitors recommendation changes ✦ Replaces pods when adjustments are needed

Horizontal Pod Autoscaler ✦ Adjusts number of pods replicas ✦
Based on workload ✦ Resource utilization (CPU/memory) ✦ Tra ffi c-Driven ✦ Scales both up and down ✦ Most common & widely used pattern

K8S Event-Driven Autoscaler ✦ Supports both Horizontal & Vertical scaling
✦ Based on custom events & metrics ✦ Ideal for: ✦ Event-driven architectures (message queues, event buses) ✦ Serverless-style applications (scale to zero) ✦ Workloads determined by data availability

More Tools & Platforms ✦ We're only scratched the surface
✦ Core Tools ✦ MLFlow ✦ Seldon Core ✦ KServe ✦ Runtimes ✦ Nvidia Triton ✦ TensorFlow Serving

Questions? These Slides → shyr.io/t/scaling-ai-with-k8s More Talks → shyr.io/talks shyr.io
[email protected] @sheharyarn  @ 

Scaling AI Applications with Kubernetes

Scaling AI Applications with Kubernetes

Sheharyar Naseer

More Decks by Sheharyar Naseer

Other Decks in Programming

Featured

Transcript

Google Developers Group Lahore ▪︎ September 2025 SCALING AI APPLICATIONS

Outline Systems architect & technology advisor for startups and enterprises.

Background ✦ Indie nomad so ft ware architect ✦ 15+

Outline ✦ State of AI ✦ Scaling Challenges ✦ Using

State of AI What does the AI landscape look like

The AI Explosion ✦ Unprecedented growth and use of AI

AI Spectrum ✦ External API calls (Gemini, Claude, OpenAI) ✦

LLM Dominance ✦ Highest visibility ✦ Mainstream adoption ✦ Platform

Scaling Challenges Common problems when scaling AI applications

Growing Pains ✦ Scaling systems is already hard ✦ New

Examples ✦ Common ✦ Resource-bound operations ✦ Over/under-provisioned infrastructure ✦

Kubernetes for Scale How can Kubernetes help us scale AI

What is K8S? ✦ Open-source container automation framework ✦ Simple,

Scaling Strategies ✦ Vertical Pod Autoscaler (VPA) ✦ Adjusts resources

Vertical Pod Autoscaler ✦ Adjusts CPU & memory for pods

Horizontal Pod Autoscaler ✦ Adjusts number of pods replicas ✦

K8S Event-Driven Autoscaler ✦ Supports both Horizontal & Vertical scaling

More Tools & Platforms ✦ We're only scratched the surface

Questions? These Slides → shyr.io/t/scaling-ai-with-k8s More Talks → shyr.io/talks shyr.io