Adventures In Optimizing The Kubernetes API Server

Adventures In Optimizing The Kubernetes API Server Madhav Jivrajani

$ whoami • Gets super excited about systems-y stuff •
Work @ VMware (Kubernetes Upstream) • Within the Kubernetes community - SIG-{API Machinery, Scalability, Architecture, ContribEx}. ◦ Please reach out if you’d like to get started in the community! • Doing Go stuff for ~3 years, particularly love things around the Go runtime!

Agenda • 50,000 ft view of how Watches work. •
Watch Cache - zooming in juuuust a little bit. • What were the problems that existed? • What did we end up doing? • A few results!

Overview

Give me all changes since resourceVersion x and stream them
on the same connection!

Zooming In A Little Bit 🔎

Prelude - Initial Implementation of Watch

Current Design

Looks good! So, what’s the problem?

But…

“Copying over” Includes: • Allocating the result buffer of the
desired size. ◦ “Desired size” in real world scenarios can get substantially large. • Iterating over the list of items and copying them into the buffer. • Keep in mind, all of this happens under a lock… • … while other goroutines wait for this copying over to complete. • And these waiting goroutines in turn will have their own copying to do. • Because of this, we end up with spikes in memory consumption for watches opened against the “past”. ◦ We also end up wasting a few CPU cycles.

The Solution

“Copying over” Includes: • Constructing the watchCacheInterval ◦ This includes:
▪ Calculating start and end indices - uses binary search: fast! ▪ Allocating an internal buffer of constant size for further optimization. • This means we limit the memory consumption for watches from the past to a constant amount in 99% of the cases. ◦ The remaining 1% is for special cases like resource version 0. ◦ In this case, the performance is the same as before this change - no improvements.

Considerations With “Async” • Considering the interval is used to
serve events asynchronously, what do we need to keep in mind? • Prelude: ◦ As and when the watchCache becomes full, events are popped off - this is called “propagation”. ◦ The event to be popped off is tracked by an index internally called the startIndex. ◦ The interval tracks the event to be served also using an index called startIndex (different entities, same name!)

Considerations With “Async”

Considerations With “Async” If propagation happens faster than the interval
serves events…

Considerations With “Async”

Considerations With “Async” Once the interval is invalidated, we stop
the Watch.

A Few Results

Benchmark

Memory Usage Cumulative Total Mem In Bytes (worst case) Before
819200 After 1024

Memory Usage

Memory Proﬁle Before:

Memory Proﬁle After:

CPU Proﬁle Before:

CPU Proﬁle After:

Future Work • Paginating the watchCache itself ◦ BTree based
backing cache ◦ Copy on Write Semantics • Caching serialization/deserialization

References • Life of A Kubernetes Watch Event • PRs
implementing this change: #1, #2 • Tracking issue for future work

Thank you! Twitter: @MadhavJivrajani K8s slack: @madhav

Adventures In Optimizing The Kubernetes API Server

Adventures In Optimizing The Kubernetes API Server

More Decks by Madhav Jivrajani

Other Decks in Programming

Featured

Transcript