$30 off During Our Annual Pro Sale. View Details »

How to double* the performance of vSwitch-based deployments

How to double* the performance of vSwitch-based deployments

NUMA-aware vSwitches (and more) in action

Stephen Finucane

November 13, 2018
Tweet

More Decks by Stephen Finucane

Other Decks in Technology

Transcript

  1. How to double* the performance of
    vSwitch-based deployments
    NUMA-aware vSwitches (and more) in action
    Stephen Finucane
    OpenStack Software Developer
    13th November 2018

    View Slide

  2. INSERT DESIGNATOR, IF NEEDED
    2
    Agenda
    ● What is NUMA?
    ● The Problem
    ● A Solution
    ● Common Questions
    ● Bonus Section
    ● Summary
    ● Questions?

    View Slide

  3. What is NUMA?

    View Slide

  4. INSERT DESIGNATOR, IF NEEDED
    4
    What is NUMA?
    Non-Uniform Memory Architecture

    View Slide

  5. INSERT DESIGNATOR, IF NEEDED
    5
    What is NUMA?
    UMA (Uniform Memory Access)
    Historically, all memory on x86 systems is equally accessible by all CPUs.
    Known as Uniform Memory Access (UMA), access times are the same no
    matter which CPU performs the operation.
    NUMA (Non-Uniform Memory Access)
    In Non-Uniform Memory Access (NUMA), system memory is divided into
    zones (called nodes), which are allocated to particular CPUs or sockets.
    Access to memory that is local to a CPU is faster than memory connected
    to remote CPUs on that system.

    View Slide

  6. INSERT DESIGNATOR, IF NEEDED
    6
    What is NUMA?
    node A node B
    Local Access Remote Access
    Memory Channel Interconnect Memory Channel

    View Slide

  7. INSERT DESIGNATOR, IF NEEDED
    7
    What is NUMA?
    node A node B
    node C node D

    View Slide

  8. INSERT DESIGNATOR, IF NEEDED
    8
    What is NUMA?
    node A node B
    node C node D

    View Slide

  9. INSERT DESIGNATOR, IF NEEDED
    9

    View Slide

  10. The Problem

    View Slide

  11. INSERT DESIGNATOR, IF NEEDED
    11
    Types of Networking*
    Kernel vHost (or virtio)
    Low performance, flexible
    Userspace vHost (DPDK)
    High performance, moderately flexible
    SR-IOV
    High performance, inflexible

    View Slide

  12. INSERT DESIGNATOR, IF NEEDED
    12
    Types of Networking*
    Kernel vHost (or virtio)
    Low performance, flexible
    Userspace vHost (DPDK)
    High performance, moderately flexible
    SR-IOV
    High performance, inflexible

    View Slide

  13. INSERT DESIGNATOR, IF NEEDED
    13

    View Slide

  14. INSERT DESIGNATOR, IF NEEDED
    14

    View Slide

  15. INSERT DESIGNATOR, IF NEEDED
    15

    View Slide

  16. INSERT DESIGNATOR, IF NEEDED
    16

    View Slide

  17. INSERT DESIGNATOR, IF NEEDED
    17

    View Slide

  18. INSERT DESIGNATOR, IF NEEDED
    18

    View Slide

  19. INSERT DESIGNATOR, IF NEEDED
    19

    View Slide

  20. INSERT DESIGNATOR, IF NEEDED
    20

    View Slide

  21. A Solution

    View Slide

  22. INSERT DESIGNATOR, IF NEEDED
    Neutron?
    22

    View Slide

  23. INSERT DESIGNATOR, IF NEEDED
    23
    What do nova and neutron know?
    ● Nova knows
    ○ How much RAM, DISK, CPU do I have?
    ○ What is the NUMA topology of my hardware?
    ○ What hypervisor am I using? etc.
    ● Neutron Knows
    ○ What networking drivers(s) are available?
    ○ How much bandwidth is available for a given interface?
    ○ How do networks map to NICs? etc.

    View Slide

  24. INSERT DESIGNATOR, IF NEEDED
    24
    What do nova and neutron know?
    ● Nova knows
    ○ How much RAM, DISK, CPU do I have?
    ○ What is the NUMA topology of my hardware?
    ○ What hypervisor am I using? etc.
    ● Neutron Knows
    ○ What networking drivers(s) are available?
    ○ How much bandwidth is available for a given interface?
    ○ How do networks map to NICs? etc.

    View Slide

  25. INSERT DESIGNATOR, IF NEEDED
    Placement?
    25

    View Slide

  26. INSERT DESIGNATOR, IF NEEDED
    26
    Placement couldn’t do it…
    ● No nested resource providers
    ● No NUMA modelling
    ● No interaction between different services
    ● Placement models what it’s told to

    View Slide

  27. INSERT DESIGNATOR, IF NEEDED
    Nova?
    27

    View Slide

  28. INSERT DESIGNATOR, IF NEEDED
    28
    What do nova and neutron know?
    ● Nova knows
    ○ How much RAM, DISK, CPU do I have?
    ○ What is the NUMA topology of my hardware?
    ○ What hypervisor am I using? etc.
    ● Neutron Knows
    ○ What networking drivers(s) are available?
    ○ How much bandwidth is available for a given interface?
    ○ How do networks map to NICs? etc.

    View Slide

  29. INSERT DESIGNATOR, IF NEEDED
    29
    What do nova and neutron know?
    ● Nova knows
    ○ How much RAM, DISK, CPU do I have?
    ○ What is the NUMA topology of my hardware?
    ○ What hypervisor am I using? etc.
    ● Neutron Knows
    ○ What networking drivers(s) are available?
    ○ How much bandwidth is available for a given interface?
    ○ How do networks map to NICs? etc.

    View Slide

  30. INSERT DESIGNATOR, IF NEEDED
    Nova ✅
    (with caveats)
    30

    View Slide

  31. INSERT DESIGNATOR, IF NEEDED
    31
    Determining NUMA affinity of networks
    ● Provider networks vs. Tenant networks? ❌

    View Slide

  32. INSERT DESIGNATOR, IF NEEDED
    32
    Determining NUMA affinity of networks
    ● Provider networks vs. Tenant networks? ❌
    ● Pre-created networking vs. Self-serviced networking? ❌

    View Slide

  33. INSERT DESIGNATOR, IF NEEDED
    33
    Determining NUMA affinity of networks
    ● Provider networks vs. Tenant networks? ❌
    ● Pre-created networking vs. Self-serviced networking? ❌
    ● L2 networks vs. L3 networks ✅

    View Slide

  34. INSERT DESIGNATOR, IF NEEDED
    34
    L2 network configuration (neutron)
    [ovs]
    bridge_mappings = physnet0:br-physnet0
    openvswitch_agent.ini

    View Slide

  35. INSERT DESIGNATOR, IF NEEDED
    35
    L2 network configuration (nova)
    [neutron]
    physnets = physnet0,physnet1
    [neutron_physnet_physnet0]
    numa_nodes = 0
    nova.conf

    View Slide

  36. INSERT DESIGNATOR, IF NEEDED
    36

    View Slide

  37. INSERT DESIGNATOR, IF NEEDED
    37
    L2 network configuration (nova)
    [neutron]
    physnets = physnet0,physnet1
    [neutron_physnet_physnet0]
    numa_nodes = 0
    nova.conf

    View Slide

  38. INSERT DESIGNATOR, IF NEEDED
    38
    L2 network configuration (nova)
    [neutron]
    physnets = physnet0,physnet1
    [neutron_physnet_physnet0]
    numa_nodes = 0,1
    nova.conf

    View Slide

  39. INSERT DESIGNATOR, IF NEEDED
    39
    L3 network configuration (neutron)
    [ovs]
    local_ip = OVERLAY_INTERFACE_IP_ADDRESS
    openvswitch_agent.ini

    View Slide

  40. INSERT DESIGNATOR, IF NEEDED
    40
    L3 network configuration (nova)
    [neutron_tunnel]
    numa_nodes = 1
    nova.conf

    View Slide

  41. INSERT DESIGNATOR, IF NEEDED
    41
    L3 network configuration (nova)
    [neutron_tunnel]
    numa_nodes = 0,1
    nova.conf

    View Slide

  42. Common Questions

    View Slide

  43. INSERT DESIGNATOR, IF NEEDED
    43
    Common Questions
    ● Why so manual?

    View Slide

  44. INSERT DESIGNATOR, IF NEEDED
    44
    Common Questions
    ● Why so manual?
    ● Can I automate any of this?

    View Slide

  45. INSERT DESIGNATOR, IF NEEDED
    45
    Common Questions
    ● Why so manual?
    ● Can I automate any of this?
    ● Will this ever move to placement?

    View Slide

  46. Bonus Section

    View Slide

  47. INSERT DESIGNATOR, IF NEEDED
    47
    Configurable TX/RX Queue Size
    ● Pre-emption can result in packet drops
    Rocky

    View Slide

  48. INSERT DESIGNATOR, IF NEEDED
    48
    Configurable TX/RX Queue Size
    ● Pre-emption can result in packet drops
    ● Solution: make queues sizes bigger! (256 → 1024)
    Rocky

    View Slide

  49. INSERT DESIGNATOR, IF NEEDED
    49
    Configurable TX/RX Queue Size
    [libvirt]
    tx_queue_size = 1024
    rx_queue_size = 1024
    nova.conf
    Rocky

    View Slide

  50. INSERT DESIGNATOR, IF NEEDED
    50
    Emulator Thread Pinning
    ● Hypervisors overhead tasks can steal resources from your vCPUs
    Ocata

    View Slide

  51. INSERT DESIGNATOR, IF NEEDED
    51
    Emulator Thread Pinning
    ● Hypervisors overhead tasks can steal resources from your vCPUs
    ● Solution: ensure overhead tasks run on a dedicated core
    Ocata

    View Slide

  52. INSERT DESIGNATOR, IF NEEDED
    52
    Emulator Thread Pinning
    ● Hypervisors overhead tasks can steal resources from your vCPUs
    ● Solution: ensure overhead tasks run on a dedicated core
    ● Solution: ensure overhead tasks run on a dedicated pool of cores
    Ocata Rocky

    View Slide

  53. INSERT DESIGNATOR, IF NEEDED
    53
    Emulator Thread Pinning
    $ openstack flavor set $flavor \
    --property 'hw:emulator_threads_policy=isolate'
    Ocata

    View Slide

  54. INSERT DESIGNATOR, IF NEEDED
    54
    Emulator Thread Pinning
    $ openstack flavor set $flavor \
    --property 'hw:emulator_threads_policy=isolate'
    $ openstack flavor set $flavor \
    --property 'hw:emulator_threads_policy=share'
    Rocky
    Ocata

    View Slide

  55. INSERT DESIGNATOR, IF NEEDED
    55
    Emulator Thread Pinning
    [compute]
    cpu_shared_set = 0-1
    nova.conf
    Ocata Rocky

    View Slide

  56. INSERT DESIGNATOR, IF NEEDED
    56
    Tracking pCPUs via Placement
    ● CPU pinning and NUMA are hard to configure and understand
    ● No way to use use vCPUs and pCPUs on the same host
    ● No way to use use vCPUs and pCPUs in the same instance
    Stein?

    View Slide

  57. INSERT DESIGNATOR, IF NEEDED
    57
    Tracking pCPUs via Placement
    ● CPU pinning and NUMA are hard to configure and understand
    ● No way to use use vCPUs and pCPUs on the same host
    ● No way to use use vCPUs and pCPUs in the same instance
    ● Solution: track PCPUs as resources in placement
    Stein?

    View Slide

  58. INSERT DESIGNATOR, IF NEEDED
    58
    Tracking pCPUs via Placement
    $ openstack flavor set $flavor \
    --property 'resources:PCPU=10' \
    --property 'resources:VCPU=10'
    Stein?

    View Slide

  59. INSERT DESIGNATOR, IF NEEDED
    59
    Tracking pCPUs via Placement
    [compute]
    cpu_shared_set = 0-9,20-29
    cpu_dedicated_set = 10-19,30-39
    nova.conf
    Stein?

    View Slide

  60. INSERT DESIGNATOR, IF NEEDED
    60
    Live migration with pCPUs
    ● Live migration of instances with a NUMA topology is broken
    Stein?

    View Slide

  61. INSERT DESIGNATOR, IF NEEDED
    61
    Live migration with pCPUs
    ● Live migration of instances with a NUMA topology is broken
    ● Solution: fix it
    Stein?

    View Slide

  62. Summary

    View Slide

  63. INSERT DESIGNATOR, IF NEEDED
    63
    Summary
    ● Not accounting for NUMA can cause huge performance hits

    View Slide

  64. INSERT DESIGNATOR, IF NEEDED
    64
    Summary
    ● Not accounting for NUMA can cause huge performance hits
    ● NUMA-aware vSwitches are thing since Rocky
    ○ nova.conf based configuration, mostly a deployment issue

    View Slide

  65. INSERT DESIGNATOR, IF NEEDED
    65
    Summary
    ● Not accounting for NUMA can cause huge performance hits
    ● NUMA-aware vSwitches are thing since Rocky
    ○ nova.conf based configuration, mostly a deployment issue
    ● Future work will explore moving this to placement

    View Slide

  66. INSERT DESIGNATOR, IF NEEDED
    66
    Summary
    ● Not accounting for NUMA can cause huge performance hits
    ● NUMA-aware vSwitches are thing since Rocky
    ○ nova.conf based configuration, mostly a deployment issue
    ● Future work will explore moving this to placement
    ● Lots of other features that can also help, now and in the future
    ○ TX/RX queue sizes, emulator thread pinning, vCPU-pCPU
    coexistence, live migration with NUMA topologies

    View Slide

  67. Questions?

    View Slide

  68. THANK YOU
    plus.google.com/+RedHat
    linkedin.com/company/red-hat
    youtube.com/user/RedHatVideos
    facebook.com/redhatinc
    twitter.com/RedHatNews

    View Slide

  69. INSERT DESIGNATOR, IF NEEDED
    69
    Resources
    You might want to know about these...
    ● RHEL NUMA Tuning Guide
    ● Attaching physical PCI devices to guests
    ● Nova Flavors Guide
    ● NUMA-aware vSwitches spec
    ● Emulator Thread Pinning spec (out-of-date!)
    ● TX/RX Queue Sizes spec
    ● CPU Tracking via Placement spec (draft)

    View Slide