Slide 1

Slide 1 text

How to double* the performance of vSwitch-based deployments NUMA-aware vSwitches (and more) in action Stephen Finucane OpenStack Software Developer 13th November 2018

Slide 2

Slide 2 text

INSERT DESIGNATOR, IF NEEDED 2 Agenda ● What is NUMA? ● The Problem ● A Solution ● Common Questions ● Bonus Section ● Summary ● Questions?

Slide 3

Slide 3 text

What is NUMA?

Slide 4

Slide 4 text

INSERT DESIGNATOR, IF NEEDED 4 What is NUMA? Non-Uniform Memory Architecture

Slide 5

Slide 5 text

INSERT DESIGNATOR, IF NEEDED 5 What is NUMA? UMA (Uniform Memory Access) Historically, all memory on x86 systems is equally accessible by all CPUs. Known as Uniform Memory Access (UMA), access times are the same no matter which CPU performs the operation. NUMA (Non-Uniform Memory Access) In Non-Uniform Memory Access (NUMA), system memory is divided into zones (called nodes), which are allocated to particular CPUs or sockets. Access to memory that is local to a CPU is faster than memory connected to remote CPUs on that system.

Slide 6

Slide 6 text

INSERT DESIGNATOR, IF NEEDED 6 What is NUMA? node A node B Local Access Remote Access Memory Channel Interconnect Memory Channel

Slide 7

Slide 7 text

INSERT DESIGNATOR, IF NEEDED 7 What is NUMA? node A node B node C node D

Slide 8

Slide 8 text

INSERT DESIGNATOR, IF NEEDED 8 What is NUMA? node A node B node C node D

Slide 9

Slide 9 text

INSERT DESIGNATOR, IF NEEDED 9

Slide 10

Slide 10 text

The Problem

Slide 11

Slide 11 text

INSERT DESIGNATOR, IF NEEDED 11 Types of Networking* Kernel vHost (or virtio) Low performance, flexible Userspace vHost (DPDK) High performance, moderately flexible SR-IOV High performance, inflexible

Slide 12

Slide 12 text

INSERT DESIGNATOR, IF NEEDED 12 Types of Networking* Kernel vHost (or virtio) Low performance, flexible Userspace vHost (DPDK) High performance, moderately flexible SR-IOV High performance, inflexible

Slide 13

Slide 13 text

INSERT DESIGNATOR, IF NEEDED 13

Slide 14

Slide 14 text

INSERT DESIGNATOR, IF NEEDED 14

Slide 15

Slide 15 text

INSERT DESIGNATOR, IF NEEDED 15

Slide 16

Slide 16 text

INSERT DESIGNATOR, IF NEEDED 16

Slide 17

Slide 17 text

INSERT DESIGNATOR, IF NEEDED 17

Slide 18

Slide 18 text

INSERT DESIGNATOR, IF NEEDED 18

Slide 19

Slide 19 text

INSERT DESIGNATOR, IF NEEDED 19

Slide 20

Slide 20 text

INSERT DESIGNATOR, IF NEEDED 20

Slide 21

Slide 21 text

A Solution

Slide 22

Slide 22 text

INSERT DESIGNATOR, IF NEEDED Neutron? 22

Slide 23

Slide 23 text

INSERT DESIGNATOR, IF NEEDED 23 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.

Slide 24

Slide 24 text

INSERT DESIGNATOR, IF NEEDED 24 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.

Slide 25

Slide 25 text

INSERT DESIGNATOR, IF NEEDED Placement? 25

Slide 26

Slide 26 text

INSERT DESIGNATOR, IF NEEDED 26 Placement couldn’t do it… ● No nested resource providers ● No NUMA modelling ● No interaction between different services ● Placement models what it’s told to

Slide 27

Slide 27 text

INSERT DESIGNATOR, IF NEEDED Nova? 27

Slide 28

Slide 28 text

INSERT DESIGNATOR, IF NEEDED 28 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.

Slide 29

Slide 29 text

INSERT DESIGNATOR, IF NEEDED 29 What do nova and neutron know? ● Nova knows ○ How much RAM, DISK, CPU do I have? ○ What is the NUMA topology of my hardware? ○ What hypervisor am I using? etc. ● Neutron Knows ○ What networking drivers(s) are available? ○ How much bandwidth is available for a given interface? ○ How do networks map to NICs? etc.

Slide 30

Slide 30 text

INSERT DESIGNATOR, IF NEEDED Nova ✅ (with caveats) 30

Slide 31

Slide 31 text

INSERT DESIGNATOR, IF NEEDED 31 Determining NUMA affinity of networks ● Provider networks vs. Tenant networks? ❌

Slide 32

Slide 32 text

INSERT DESIGNATOR, IF NEEDED 32 Determining NUMA affinity of networks ● Provider networks vs. Tenant networks? ❌ ● Pre-created networking vs. Self-serviced networking? ❌

Slide 33

Slide 33 text

INSERT DESIGNATOR, IF NEEDED 33 Determining NUMA affinity of networks ● Provider networks vs. Tenant networks? ❌ ● Pre-created networking vs. Self-serviced networking? ❌ ● L2 networks vs. L3 networks ✅

Slide 34

Slide 34 text

INSERT DESIGNATOR, IF NEEDED 34 L2 network configuration (neutron) [ovs] bridge_mappings = physnet0:br-physnet0 openvswitch_agent.ini

Slide 35

Slide 35 text

INSERT DESIGNATOR, IF NEEDED 35 L2 network configuration (nova) [neutron] physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0 nova.conf

Slide 36

Slide 36 text

INSERT DESIGNATOR, IF NEEDED 36

Slide 37

Slide 37 text

INSERT DESIGNATOR, IF NEEDED 37 L2 network configuration (nova) [neutron] physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0 nova.conf

Slide 38

Slide 38 text

INSERT DESIGNATOR, IF NEEDED 38 L2 network configuration (nova) [neutron] physnets = physnet0,physnet1 [neutron_physnet_physnet0] numa_nodes = 0,1 nova.conf

Slide 39

Slide 39 text

INSERT DESIGNATOR, IF NEEDED 39 L3 network configuration (neutron) [ovs] local_ip = OVERLAY_INTERFACE_IP_ADDRESS openvswitch_agent.ini

Slide 40

Slide 40 text

INSERT DESIGNATOR, IF NEEDED 40 L3 network configuration (nova) [neutron_tunnel] numa_nodes = 1 nova.conf

Slide 41

Slide 41 text

INSERT DESIGNATOR, IF NEEDED 41 L3 network configuration (nova) [neutron_tunnel] numa_nodes = 0,1 nova.conf

Slide 42

Slide 42 text

Common Questions

Slide 43

Slide 43 text

INSERT DESIGNATOR, IF NEEDED 43 Common Questions ● Why so manual?

Slide 44

Slide 44 text

INSERT DESIGNATOR, IF NEEDED 44 Common Questions ● Why so manual? ● Can I automate any of this?

Slide 45

Slide 45 text

INSERT DESIGNATOR, IF NEEDED 45 Common Questions ● Why so manual? ● Can I automate any of this? ● Will this ever move to placement?

Slide 46

Slide 46 text

Bonus Section

Slide 47

Slide 47 text

INSERT DESIGNATOR, IF NEEDED 47 Configurable TX/RX Queue Size ● Pre-emption can result in packet drops Rocky

Slide 48

Slide 48 text

INSERT DESIGNATOR, IF NEEDED 48 Configurable TX/RX Queue Size ● Pre-emption can result in packet drops ● Solution: make queues sizes bigger! (256 → 1024) Rocky

Slide 49

Slide 49 text

INSERT DESIGNATOR, IF NEEDED 49 Configurable TX/RX Queue Size [libvirt] tx_queue_size = 1024 rx_queue_size = 1024 nova.conf Rocky

Slide 50

Slide 50 text

INSERT DESIGNATOR, IF NEEDED 50 Emulator Thread Pinning ● Hypervisors overhead tasks can steal resources from your vCPUs Ocata

Slide 51

Slide 51 text

INSERT DESIGNATOR, IF NEEDED 51 Emulator Thread Pinning ● Hypervisors overhead tasks can steal resources from your vCPUs ● Solution: ensure overhead tasks run on a dedicated core Ocata

Slide 52

Slide 52 text

INSERT DESIGNATOR, IF NEEDED 52 Emulator Thread Pinning ● Hypervisors overhead tasks can steal resources from your vCPUs ● Solution: ensure overhead tasks run on a dedicated core ● Solution: ensure overhead tasks run on a dedicated pool of cores Ocata Rocky

Slide 53

Slide 53 text

INSERT DESIGNATOR, IF NEEDED 53 Emulator Thread Pinning $ openstack flavor set $flavor \ --property 'hw:emulator_threads_policy=isolate' Ocata

Slide 54

Slide 54 text

INSERT DESIGNATOR, IF NEEDED 54 Emulator Thread Pinning $ openstack flavor set $flavor \ --property 'hw:emulator_threads_policy=isolate' $ openstack flavor set $flavor \ --property 'hw:emulator_threads_policy=share' Rocky Ocata

Slide 55

Slide 55 text

INSERT DESIGNATOR, IF NEEDED 55 Emulator Thread Pinning [compute] cpu_shared_set = 0-1 nova.conf Ocata Rocky

Slide 56

Slide 56 text

INSERT DESIGNATOR, IF NEEDED 56 Tracking pCPUs via Placement ● CPU pinning and NUMA are hard to configure and understand ● No way to use use vCPUs and pCPUs on the same host ● No way to use use vCPUs and pCPUs in the same instance Stein?

Slide 57

Slide 57 text

INSERT DESIGNATOR, IF NEEDED 57 Tracking pCPUs via Placement ● CPU pinning and NUMA are hard to configure and understand ● No way to use use vCPUs and pCPUs on the same host ● No way to use use vCPUs and pCPUs in the same instance ● Solution: track PCPUs as resources in placement Stein?

Slide 58

Slide 58 text

INSERT DESIGNATOR, IF NEEDED 58 Tracking pCPUs via Placement $ openstack flavor set $flavor \ --property 'resources:PCPU=10' \ --property 'resources:VCPU=10' Stein?

Slide 59

Slide 59 text

INSERT DESIGNATOR, IF NEEDED 59 Tracking pCPUs via Placement [compute] cpu_shared_set = 0-9,20-29 cpu_dedicated_set = 10-19,30-39 nova.conf Stein?

Slide 60

Slide 60 text

INSERT DESIGNATOR, IF NEEDED 60 Live migration with pCPUs ● Live migration of instances with a NUMA topology is broken Stein?

Slide 61

Slide 61 text

INSERT DESIGNATOR, IF NEEDED 61 Live migration with pCPUs ● Live migration of instances with a NUMA topology is broken ● Solution: fix it Stein?

Slide 62

Slide 62 text

Summary

Slide 63

Slide 63 text

INSERT DESIGNATOR, IF NEEDED 63 Summary ● Not accounting for NUMA can cause huge performance hits

Slide 64

Slide 64 text

INSERT DESIGNATOR, IF NEEDED 64 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue

Slide 65

Slide 65 text

INSERT DESIGNATOR, IF NEEDED 65 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue ● Future work will explore moving this to placement

Slide 66

Slide 66 text

INSERT DESIGNATOR, IF NEEDED 66 Summary ● Not accounting for NUMA can cause huge performance hits ● NUMA-aware vSwitches are thing since Rocky ○ nova.conf based configuration, mostly a deployment issue ● Future work will explore moving this to placement ● Lots of other features that can also help, now and in the future ○ TX/RX queue sizes, emulator thread pinning, vCPU-pCPU coexistence, live migration with NUMA topologies

Slide 67

Slide 67 text

Questions?

Slide 68

Slide 68 text

THANK YOU plus.google.com/+RedHat linkedin.com/company/red-hat youtube.com/user/RedHatVideos facebook.com/redhatinc twitter.com/RedHatNews

Slide 69

Slide 69 text

INSERT DESIGNATOR, IF NEEDED 69 Resources You might want to know about these... ● RHEL NUMA Tuning Guide ● Attaching physical PCI devices to guests ● Nova Flavors Guide ● NUMA-aware vSwitches spec ● Emulator Thread Pinning spec (out-of-date!) ● TX/RX Queue Sizes spec ● CPU Tracking via Placement spec (draft)