Slide 1

Slide 1 text

Adventures In Optimizing The Kubernetes API Server Madhav Jivrajani

Slide 2

Slide 2 text

$ whoami ● Gets super excited about systems-y stuff ● Work @ VMware (Kubernetes Upstream) ● Within the Kubernetes community - SIG-{API Machinery, Scalability, Architecture, ContribEx}. ○ Please reach out if you’d like to get started in the community! ● Doing Go stuff for ~3 years, particularly love things around the Go runtime!

Slide 3

Slide 3 text

Agenda ● 50,000 ft view of how Watches work. ● Watch Cache - zooming in juuuust a little bit. ● What were the problems that existed? ● What did we end up doing? ● A few results!

Slide 4

Slide 4 text

Overview

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Give me all changes since resourceVersion x and stream them on the same connection!

Slide 7

Slide 7 text

Zooming In A Little Bit 🔎

Slide 8

Slide 8 text

Prelude - Initial Implementation of Watch

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Current Design

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Looks good! So, what’s the problem?

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

But…

Slide 20

Slide 20 text

No content

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

“Copying over” Includes: ● Allocating the result buffer of the desired size. ○ “Desired size” in real world scenarios can get substantially large. ● Iterating over the list of items and copying them into the buffer. ● Keep in mind, all of this happens under a lock… ● … while other goroutines wait for this copying over to complete. ● And these waiting goroutines in turn will have their own copying to do. ● Because of this, we end up with spikes in memory consumption for watches opened against the “past”. ○ We also end up wasting a few CPU cycles.

Slide 24

Slide 24 text

The Solution

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

“Copying over” Includes: ● Constructing the watchCacheInterval ○ This includes: ■ Calculating start and end indices - uses binary search: fast! ■ Allocating an internal buffer of constant size for further optimization. ● This means we limit the memory consumption for watches from the past to a constant amount in 99% of the cases. ○ The remaining 1% is for special cases like resource version 0. ○ In this case, the performance is the same as before this change - no improvements.

Slide 29

Slide 29 text

Considerations With “Async” ● Considering the interval is used to serve events asynchronously, what do we need to keep in mind? ● Prelude: ○ As and when the watchCache becomes full, events are popped off - this is called “propagation”. ○ The event to be popped off is tracked by an index internally called the startIndex. ○ The interval tracks the event to be served also using an index called startIndex (different entities, same name!)

Slide 30

Slide 30 text

Considerations With “Async”

Slide 31

Slide 31 text

Considerations With “Async”

Slide 32

Slide 32 text

Considerations With “Async”

Slide 33

Slide 33 text

Considerations With “Async” If propagation happens faster than the interval serves events…

Slide 34

Slide 34 text

Considerations With “Async”

Slide 35

Slide 35 text

Considerations With “Async”

Slide 36

Slide 36 text

Considerations With “Async”

Slide 37

Slide 37 text

Considerations With “Async”

Slide 38

Slide 38 text

Considerations With “Async”

Slide 39

Slide 39 text

Considerations With “Async” Once the interval is invalidated, we stop the Watch.

Slide 40

Slide 40 text

A Few Results

Slide 41

Slide 41 text

Benchmark

Slide 42

Slide 42 text

Memory Usage Cumulative Total Mem In Bytes (worst case) Before 819200 After 1024

Slide 43

Slide 43 text

Memory Usage

Slide 44

Slide 44 text

Memory Profile Before:

Slide 45

Slide 45 text

Memory Profile After:

Slide 46

Slide 46 text

CPU Profile Before:

Slide 47

Slide 47 text

CPU Profile After:

Slide 48

Slide 48 text

CPU Profile After:

Slide 49

Slide 49 text

Future Work ● Paginating the watchCache itself ○ BTree based backing cache ○ Copy on Write Semantics ● Caching serialization/deserialization

Slide 50

Slide 50 text

References ● Life of A Kubernetes Watch Event ● PRs implementing this change: #1, #2 ● Tracking issue for future work

Slide 51

Slide 51 text

Thank you! Twitter: @MadhavJivrajani K8s slack: @madhav