How to double* the performance of vSwitch-based deployments NUMA-aware vSwitches (and more) in action Stephen Finucane OpenStack Software Developer 13th November 2018
INSERT DESIGNATOR, IF NEEDED 5 What is NUMA? UMA (Uniform Memory Access) Historically, all memory on x86 systems is equally accessible by all CPUs. Known as Uniform Memory Access (UMA), access times are the same no matter which CPU performs the operation. NUMA (Non-Uniform Memory Access) In Non-Uniform Memory Access (NUMA), system memory is divided into zones (called nodes), which are allocated to particular CPUs or sockets. Access to memory that is local to a CPU is faster than memory connected to remote CPUs on that system.
INSERT DESIGNATOR, IF NEEDED 23 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.
INSERT DESIGNATOR, IF NEEDED 24 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.
INSERT DESIGNATOR, IF NEEDED 26 Placement couldn’t do it… ● No nested resource providers ● No NUMA modelling ● No interaction between different services ● Placement models what it’s told to
INSERT DESIGNATOR, IF NEEDED 28 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.
INSERT DESIGNATOR, IF NEEDED 29 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.
INSERT DESIGNATOR, IF NEEDED 32 Determining NUMA affinity of networks ● Provider networks vs. Tenant networks? ❌ ● Pre-created networking vs. Self-serviced networking? ❌
INSERT DESIGNATOR, IF NEEDED 51 Emulator Thread Pinning ● Hypervisors overhead tasks can steal resources from your vCPUs ● Solution: ensure overhead tasks run on a dedicated core Ocata
INSERT DESIGNATOR, IF NEEDED 52 Emulator Thread Pinning ● Hypervisors overhead tasks can steal resources from your vCPUs ● Solution: ensure overhead tasks run on a dedicated core ● Solution: ensure overhead tasks run on a dedicated pool of cores Ocata Rocky
INSERT DESIGNATOR, IF NEEDED 56 Tracking pCPUs via Placement ● CPU pinning and NUMA are hard to configure and understand ● No way to use use vCPUs and pCPUs on the same host ● No way to use use vCPUs and pCPUs in the same instance Stein?
INSERT DESIGNATOR, IF NEEDED 57 Tracking pCPUs via Placement ● CPU pinning and NUMA are hard to configure and understand ● No way to use use vCPUs and pCPUs on the same host ● No way to use use vCPUs and pCPUs in the same instance ● Solution: track PCPUs as resources in placement Stein?
INSERT DESIGNATOR, IF NEEDED 64 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue
INSERT DESIGNATOR, IF NEEDED 65 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue ● Future work will explore moving this to placement
INSERT DESIGNATOR, IF NEEDED 66 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue ● Future work will explore moving this to placement ● Lots of other features that can also help, now and in the future ○ TX/RX queue sizes, emulator thread pinning, vCPU-pCPU coexistence, live migration with NUMA topologies