Kubecon EU 2018

Envoy internals deep dive Matt Klein / @mattklein123, Software Engineer
@Lyft

Agenda • Envoy design goals • Architecture overview • Threading
model • Hot restart • Stats • Q&A

What is Envoy? The network should be transparent to applications.
When network and application problems do occur it should be easy to determine the source of the problem.

Envoy design goals • Out of process architecture • Low
latency, high performance, dev productivity • L3/L4 filter architecture • HTTP L7 filter architecture • HTTP/2 first • Service/config discovery • Active/passive health checking • Advanced load balancing • Best in class observability • Service/middle/edge proxy • Hot restart

Envoy architecture diagram Connection Listener filters TCP filter manager TCP
Read Filters TCP write filters HTTP conn manager HTTP codec HTTP read filters HTTP write filters Service router Upstream conn pool Backend services Stats Admin Cluster/Listener/Route Manager xDS API Worker

Envoy threading model (c10k) Thread Connection • Connection per thread
does not scale • Scaling requires many connections per thread: “c10k” • Requires asynchronous programming paradigms: harder Thread Event loop Connection Connection Connection

Envoy threading model (overview) Main thread Worker thread(s) Worker thread(s)
Worker thread(s) Worker thread(s) Worker thread(s) File flush thread(s) Listeners xDS Runtime Stat flush Admin Connections Process management • Main thread handles non-data plane misc tasks • Worker thread(s) embarrassingly parallel and handle listeners, connections, and proxying • File flush threads avoid blocking • Designed to be 100% non-blocking • Designed to scale to massive parallelism (# of HW threads)

Envoy threading model (RCU) • RCU = Read-Copy-Update • Synchronization
primitive heavily used in the Linux kernel • Scales extremely well for R/W locking that is read heavy New Ref-counted Data Reader Updater Event loop Ref-counted Data “Quiescent period” Copy

Envoy threading model (TLS and RCU) 0 1 2 3
4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5 Worker 1 Worker 2 Worker 3 Worker 4 Main Event loop post() “Slots” TLS get() • TLS = Thread Local Storage • TLS slots can be allocated dynamically by objects • RCU is used to post shared read-only data from the main thread to workers

Envoy threading model (cluster updates example) Cluster manager (1) Worker
event loop (4) IO event / load balancer (7) Post handler / TLS update (5) Health checker (2) xDS/DNS (3) TLS (6) Main Worker • Complete example of TLS and RCU for cluster updates

Envoy hot restart (overview) Load balancer Service A Service A
Service A Service A Service A Service A’ 33% 67% Load balancer Service A -> A’ Rolling deploy Hot restart deploy Service A -> A’ Service A -> A’ • Full binary reload without dropping any connections • Very useful in legacy/non-container scheduler worlds

Envoy hot restart (mechanics) Stats Locks Shared memory Stats Logs
Hot restarter Primary process Stats Logs Hot restarter Secondary process UDS • Stats/locks kept in shared memory • Simple RPC protocol over unix domain sockets (UDS) • Sockets, some stats, etc. passed between processes • Built for containers

Envoy stats (overview) Store Sink Sink Counters Counters Counters Counters
Counters Gauges Counters Counters Histograms Flusher Admin Sink • Store: holds stats • Sink: protocol adapter (statsd, gRPC, etc.) • Admin: allows pull access • Flusher: allows push access • Scope: discrete grouping of stats that can be deleted. Critical for dynamic xDS on top of hot restart shared memory

Envoy stats (TLS scopes) 1. Store is global 2. Stats
first looked up in TLS cache 3. Not found, allocated in central cache, added to TLS cache 4. Counters/gauges in shared memory 5. Histograms in process memory 6. Scope deletion causes a TLS cache flush on all threads Store (main thread) (1) Shared memory (4) Stat entries Scope central cache (3) Gauges Counters TLS scope cache flush (6) Histograms Scope central cache Scope central cache Scope TLS cache (2) Gauges Counters Histograms Process memory (5) Histograms

Envoy stats (TLS histograms) Parent histogram TLS histogram TLS histogram
TLS histogram TLS histogram Histogram A Histogram B &Current histogram recordValue(...) (1) Merge histogram (2) TLS post() to swap current (3) TLS post() back to continue merge (4) Merge all background histograms (5) (1) TLS histogram values recorded into “current” without locks (2) Period merge/flush (3) Post to each worker to swap current histogram (record now happens on alternate) (4) Post back to main thread to continue merge (5) Merge all TLS histograms without locks

Summary • Bias for developer productivity without sacrificing high throughput
and low latency • Architecture embarrassingly parallel and designed for mostly lock free scaling across high HW thread count • Heavy use of RCU locking paradigm and TLS • Design for containerized world • Extensibility is key Cloud native summary

More information • I’ve written detailed blog posts about these
topics • Please find them on Medium: https://medium.com/@mattklein123

Q&A • Thanks for coming! Questions welcome on Twitter: @mattklein123
• We are super excited about building a community around Envoy. Talk to us if you need help getting started. • https://www.envoyproxy.io/ • Lyft is hiring!

Kubecon EU 2018

Kubecon EU 2018

Matt Klein

More Decks by Matt Klein

Other Decks in Technology

Featured

Transcript

Envoy internals deep dive Matt Klein / @mattklein123, Software Engineer

Agenda • Envoy design goals • Architecture overview • Threading

What is Envoy? The network should be transparent to applications.

Envoy design goals • Out of process architecture • Low

Envoy architecture diagram Connection Listener filters TCP filter manager TCP

Envoy threading model (c10k) Thread Connection • Connection per thread

Envoy threading model (overview) Main thread Worker thread(s) Worker thread(s)

Envoy threading model (RCU) • RCU = Read-Copy-Update • Synchronization

Envoy threading model (TLS and RCU) 0 1 2 3

Envoy threading model (cluster updates example) Cluster manager (1) Worker

Envoy hot restart (overview) Load balancer Service A Service A

Envoy hot restart (mechanics) Stats Locks Shared memory Stats Logs

Envoy stats (overview) Store Sink Sink Counters Counters Counters Counters

Envoy stats (TLS scopes) 1. Store is global 2. Stats

Envoy stats (TLS histograms) Parent histogram TLS histogram TLS histogram

Summary • Bias for developer productivity without sacrificing high throughput

More information • I’ve written detailed blog posts about these

Q&A • Thanks for coming! Questions welcome on Twitter: @mattklein123