Slide 1

Slide 1 text

The Weight of Data Rethinking Cloud-Native Systems for the Age of AI Vincent Caldeira Holly Cummins Red Hat KubeCon & CloudNativeCon April 3, 2024

Slide 2

Slide 2 text

Title 2022

Slide 3

Slide 3 text

Title 2022

Slide 4

Slide 4 text

Title 2022

Slide 5

Slide 5 text

2024

Slide 6

Slide 6 text

2024

Slide 7

Slide 7 text

2024

Slide 8

Slide 8 text

A new era of intelligent, data-driven systems tools smaller models orchestration model

Slide 9

Slide 9 text

A new era of intelligent, data-driven systems tools smaller models orchestration model agents

Slide 10

Slide 10 text

A new era of intelligent, data-driven systems tools smaller models orchestration model

Slide 11

Slide 11 text

but …

Slide 12

Slide 12 text

data gravity data nodes nodes

Slide 13

Slide 13 text

data gravity

Slide 14

Slide 14 text

data gravity

Slide 15

Slide 15 text

Cloud-native infrastructure was built for stateless microservices. AI is breaking that paradigm.

Slide 16

Slide 16 text

The question: How do we evolve Kubernetes to be AI-native?

Slide 17

Slide 17 text

Kubernetes already supports some stateful workloads.

Slide 18

Slide 18 text

Kubernetes already supports some stateful workloads.

Slide 19

Slide 19 text

Kubernetes already supports some stateful workloads.

Slide 20

Slide 20 text

Kubernetes already supports some stateful workloads.

Slide 21

Slide 21 text

CNCF projects have made databases work at scale.

Slide 22

Slide 22 text

CNCF projects have made databases work at scale. -Vitess

Slide 23

Slide 23 text

CNCF projects have made databases work at scale. -Vitess

Slide 24

Slide 24 text

CNCF projects have made databases work at scale. -Vitess

Slide 25

Slide 25 text

CNCF projects have made databases work at scale. -Vitess

Slide 26

Slide 26 text

CNCF projects have made databases work at scale. -Vitess -Rook

Slide 27

Slide 27 text

CNCF projects have made databases work at scale. -Vitess -Rook -K8s-Native Storage

Slide 28

Slide 28 text

Event-driven architectures (Knative, Kafka on K8s) enable streaming intelligence.

Slide 29

Slide 29 text

Event-driven architectures (Knative, Kafka on K8s) enable streaming intelligence.

Slide 30

Slide 30 text

But AI workloads go beyond traditional stateful applications.

Slide 31

Slide 31 text

AI agents don’t just store state

Slide 32

Slide 32 text

AI agents don’t just store state share —they

Slide 33

Slide 33 text

AI agents don’t just store state share, modify —they

Slide 34

Slide 34 text

AI agents don’t just store state share, modify, and react to data —they

Slide 35

Slide 35 text

AI agents don’t just store state share, modify, and react to data across distributed nodes. —they

Slide 36

Slide 36 text

Current PV and StatefulSet models require dynamic, multi-agent AI coordination.

Slide 37

Slide 37 text

Scaling state across nodes is limited—data locality, synchronization, and performance bottlenecks emerge at extreme scale.

Slide 38

Slide 38 text

Scaling state across nodes is limited—data locality, synchronization, and performance bottlenecks emerge at extreme scale.

Slide 39

Slide 39 text

Scaling state across nodes is limited—data locality, synchronization, and performance bottlenecks emerge at extreme scale.

Slide 40

Slide 40 text

Scaling state across nodes is limited—data locality, synchronization, and performance bottlenecks emerge at extreme scale.

Slide 41

Slide 41 text

We need: - AI-native scheduling - State management

Slide 42

Slide 42 text

NUMA-aware AI scheduling We need:

Slide 43

Slide 43 text

NUMA-aware AI scheduling We need:

Slide 44

Slide 44 text

GPU and topology-aware AI scheduling We need:

Slide 45

Slide 45 text

LLM Gateway

Slide 46

Slide 46 text

We need:

Slide 47

Slide 47 text

We need: Fault-tolerant recovery mechanisms to preserve AI state beyond pod restarts

Slide 48

Slide 48 text

CNCF projects that provide a foundation:

Slide 49

Slide 49 text

CNCF projects that provide a foundation: - Kueue

Slide 50

Slide 50 text

CNCF projects that provide a foundation: - Kueue - Envoy AI Gateway

Slide 51

Slide 51 text

CNCF projects that provide a foundation: - Kueue - Envoy AI Gateway - KServe + vLLM

Slide 52

Slide 52 text

CNCF projects that provide a foundation: - Kueue - Envoy AI Gateway - KServe + vLLM - Dapr + Dapr Agents

Slide 53

Slide 53 text

CNCF projects that provide a foundation: - Kueue - Envoy AI Gateway - KServe + vLLM - Dapr + Dapr Agents - OpenTelemetry

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

The CNCF has the foundation to lead the shift towards decentralised AI.

Slide 57

Slide 57 text

AI will change cloud-native. It will be intelligent, state-aware, and distributed. Let’s build it together.

Slide 58

Slide 58 text

Come see us at our booth Vaishnavi Natale More from Vincent: Green AI in Cloud Native Ecosystems: Strategies for Sustainability and Efficiency Vincent Caldeira & Tamar Eilam 15:15 Friday Level 1 | Hall Entrance S10 | Room A red.ht/KubeConEU-Keynote Live demo: “Build your own distributed cloud native AI agent in 20 minutes” 15:30 today, Demo Theater