How to double* the performance of vSwitch-based deployments

How to double* the performance of vSwitch-based deployments NUMA-aware vSwitches
(and more) in action Stephen Finucane OpenStack Software Developer 13th November 2018

INSERT DESIGNATOR, IF NEEDED 2 Agenda • What is NUMA?
• The Problem • A Solution • Common Questions • Bonus Section • Summary • Questions?

What is NUMA?

INSERT DESIGNATOR, IF NEEDED 4 What is NUMA? Non-Uniform Memory
Architecture

INSERT DESIGNATOR, IF NEEDED 5 What is NUMA? UMA (Uniform
Memory Access) Historically, all memory on x86 systems is equally accessible by all CPUs. Known as Uniform Memory Access (UMA), access times are the same no matter which CPU performs the operation. NUMA (Non-Uniform Memory Access) In Non-Uniform Memory Access (NUMA), system memory is divided into zones (called nodes), which are allocated to particular CPUs or sockets. Access to memory that is local to a CPU is faster than memory connected to remote CPUs on that system.

INSERT DESIGNATOR, IF NEEDED 6 What is NUMA? node A
node B Local Access Remote Access Memory Channel Interconnect Memory Channel

node B node C node D

INSERT DESIGNATOR, IF NEEDED 9

The Problem

INSERT DESIGNATOR, IF NEEDED 11 Types of Networking* Kernel vHost
(or virtio) Low performance, flexible Userspace vHost (DPDK) High performance, moderately flexible SR-IOV High performance, inflexible

INSERT DESIGNATOR, IF NEEDED 12 Types of Networking* Kernel vHost
(or virtio) Low performance, flexible Userspace vHost (DPDK) High performance, moderately flexible SR-IOV High performance, inflexible

A Solution

INSERT DESIGNATOR, IF NEEDED Neutron? 22

INSERT DESIGNATOR, IF NEEDED 23 What do nova and neutron
know? • Nova knows ◦ How much RAM, DISK, CPU do I have? ◦ What is the NUMA topology of my hardware? ◦ What hypervisor am I using? etc. • Neutron Knows ◦ What networking drivers(s) are available? ◦ How much bandwidth is available for a given interface? ◦ How do networks map to NICs? etc.

INSERT DESIGNATOR, IF NEEDED Placement? 25

INSERT DESIGNATOR, IF NEEDED 26 Placement couldn’t do it… •
No nested resource providers • No NUMA modelling • No interaction between different services • Placement models what it’s told to

INSERT DESIGNATOR, IF NEEDED Nova? 27

INSERT DESIGNATOR, IF NEEDED Nova ✅ (with caveats) 30

INSERT DESIGNATOR, IF NEEDED 31 Determining NUMA affinity of networks
• Provider networks vs. Tenant networks? ❌

• Provider networks vs. Tenant networks? ❌ • Pre-created networking vs. Self-serviced networking? ❌

• Provider networks vs. Tenant networks? ❌ • Pre-created networking vs. Self-serviced networking? ❌ • L2 networks vs. L3 networks ✅

INSERT DESIGNATOR, IF NEEDED 34 L2 network configuration (neutron) [ovs]
bridge_mappings = physnet0:br-physnet0 openvswitch_agent.ini

INSERT DESIGNATOR, IF NEEDED 35 L2 network configuration (nova) [neutron]
physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0 nova.conf

physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0 nova.conf

physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0,1 nova.conf

INSERT DESIGNATOR, IF NEEDED 39 L3 network configuration (neutron) [ovs]
local_ip = OVERLAY_INTERFACE_IP_ADDRESS openvswitch_agent.ini

INSERT DESIGNATOR, IF NEEDED 40 L3 network configuration (nova) [neutron_tunnel]
numa_nodes = 1 nova.conf

INSERT DESIGNATOR, IF NEEDED 41 L3 network configuration (nova) [neutron_tunnel]
numa_nodes = 0,1 nova.conf

Common Questions

INSERT DESIGNATOR, IF NEEDED 43 Common Questions • Why so
manual?

manual? • Can I automate any of this?

manual? • Can I automate any of this? • Will this ever move to placement?

Bonus Section

INSERT DESIGNATOR, IF NEEDED 47 Configurable TX/RX Queue Size •
Pre-emption can result in packet drops Rocky

INSERT DESIGNATOR, IF NEEDED 48 Configurable TX/RX Queue Size •
Pre-emption can result in packet drops • Solution: make queues sizes bigger! (256 → 1024) Rocky

INSERT DESIGNATOR, IF NEEDED 49 Configurable TX/RX Queue Size [libvirt]
tx_queue_size = 1024 rx_queue_size = 1024 nova.conf Rocky

INSERT DESIGNATOR, IF NEEDED 50 Emulator Thread Pinning • Hypervisors
overhead tasks can steal resources from your vCPUs Ocata

overhead tasks can steal resources from your vCPUs • Solution: ensure overhead tasks run on a dedicated core Ocata

overhead tasks can steal resources from your vCPUs • Solution: ensure overhead tasks run on a dedicated core • Solution: ensure overhead tasks run on a dedicated pool of cores Ocata Rocky

INSERT DESIGNATOR, IF NEEDED 53 Emulator Thread Pinning $ openstack
flavor set $flavor \ --property 'hw:emulator_threads_policy=isolate' Ocata

INSERT DESIGNATOR, IF NEEDED 54 Emulator Thread Pinning $ openstack
flavor set $flavor \ --property 'hw:emulator_threads_policy=isolate' $ openstack flavor set $flavor \ --property 'hw:emulator_threads_policy=share' Rocky Ocata

INSERT DESIGNATOR, IF NEEDED 55 Emulator Thread Pinning [compute] cpu_shared_set
= 0-1 nova.conf Ocata Rocky

INSERT DESIGNATOR, IF NEEDED 56 Tracking pCPUs via Placement •
CPU pinning and NUMA are hard to configure and understand • No way to use use vCPUs and pCPUs on the same host • No way to use use vCPUs and pCPUs in the same instance Stein?

INSERT DESIGNATOR, IF NEEDED 57 Tracking pCPUs via Placement •
CPU pinning and NUMA are hard to configure and understand • No way to use use vCPUs and pCPUs on the same host • No way to use use vCPUs and pCPUs in the same instance • Solution: track PCPUs as resources in placement Stein?

INSERT DESIGNATOR, IF NEEDED 58 Tracking pCPUs via Placement $
openstack flavor set $flavor \ --property 'resources:PCPU=10' \ --property 'resources:VCPU=10' Stein?

INSERT DESIGNATOR, IF NEEDED 59 Tracking pCPUs via Placement [compute]
cpu_shared_set = 0-9,20-29 cpu_dedicated_set = 10-19,30-39 nova.conf Stein?

INSERT DESIGNATOR, IF NEEDED 60 Live migration with pCPUs •
Live migration of instances with a NUMA topology is broken Stein?

INSERT DESIGNATOR, IF NEEDED 61 Live migration with pCPUs •
Live migration of instances with a NUMA topology is broken • Solution: fix it Stein?

Summary

INSERT DESIGNATOR, IF NEEDED 63 Summary • Not accounting for
NUMA can cause huge performance hits

NUMA can cause huge performance hits • NUMA-aware vSwitches are thing since Rocky ◦ nova.conf based configuration, mostly a deployment issue

NUMA can cause huge performance hits • NUMA-aware vSwitches are thing since Rocky ◦ nova.conf based configuration, mostly a deployment issue • Future work will explore moving this to placement

NUMA can cause huge performance hits • NUMA-aware vSwitches are thing since Rocky ◦ nova.conf based configuration, mostly a deployment issue • Future work will explore moving this to placement • Lots of other features that can also help, now and in the future ◦ TX/RX queue sizes, emulator thread pinning, vCPU-pCPU coexistence, live migration with NUMA topologies

Questions?

THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews

INSERT DESIGNATOR, IF NEEDED 69 Resources You might want to
know about these... • RHEL NUMA Tuning Guide • Attaching physical PCI devices to guests • Nova Flavors Guide • NUMA-aware vSwitches spec • Emulator Thread Pinning spec (out-of-date!) • TX/RX Queue Sizes spec • CPU Tracking via Placement spec (draft)

How to double* the performance of vSwitch-based...

How to double* the performance of vSwitch-based deployments

More Decks by Stephen Finucane

Other Decks in Technology

Featured

Transcript