清大資工所碩論口試 -- NFV Performance PredictionBased onRun-Time Instruction Analysis

Slide 1

Slide 1 text

NFV Performance Prediction Based on Run-Time Instruction Analysis Speaker : Chun-Fu Kuo Advisor: Jung-Chun Kao Date : 2021/07/28 1 Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Communications and Networking Lab, NTHU

Slide 2

Slide 2 text

Outline o Introduction o Problem Statement o Proposed Scheme o Implementation o Environment o Measured Result o Evaluation o Conclusion 2 Communications and Networking Lab, NTHU

Slide 3

Slide 3 text

Introduction: Network Function Virtualization (NFV) Architecture 3 Communications and Networking Lab, NTHU

Slide 4

Slide 4 text

Introduction: Run-to-complete vs Pipeline Architecture 4 Communications and Networking Lab, NTHU

Slide 5

Slide 5 text

Problem Statement 5 Communications and Networking Lab, NTHU o Rationale for performance prediction o Operators: how many servers should I have? o Vendors: how is the performance regarding my NF in different servers? o Consideration o The throughput of NFs varies from different CPU, NIC, RAM, etc. o The growth of throughput against the CPU usage is not linear o Contention happens in cache, memory, NIC o CPU manufactories have their own proprietary techniques which are not presented in white paper o Objective: o Given packet arrival rate, predict the CPU usage

Slide 6

Slide 6 text

Proposed Scheme: Overview 6 Communications and Networking Lab, NTHU Measurement Preprocessing Training 1. Improve NFV Framework 2. Improve Measurement Precision 3. Trace Run-Time Instruction 4. Get CPU Usage / Packet Rate Measurement Preprocessing 1. Choose Features and Format 2. Average Metrics 1. Choose Machine Learning Algorithms 2. Improve the Lack of Training Data Training

Slide 7

Slide 7 text

Proposed Scheme: Packet Processing Timeline 7 Communications and Networking Lab, NTHU Batch Process Dequeue Process Enqueue Process PKT PKT PKT PKT PKT PKT time o NFs process packets in batch to make full use of cache and reduce overhead o Each NF has almost the same dequeue process and enqueue process o NF process flows differ from their type and configuration o So, we only have to measure the different part: packet_handler function in NFs

Slide 8

Slide 8 text

Proposed Scheme: Measurement Method 8 Communications and Networking Lab, NTHU o The performance (CPU usage) of NF depends on o Packet arrival rate o Executed instructions o Accessed memory size o Use DynamoRIO to measure instructions and memory in run time o A instrumentation tool developed by HP Labs & MIT o Free and open source o Cross architecture: IA-32, AMD64, ARM, AArch64 o Cross platform: Linux, Windows, macOS

Slide 9

Slide 9 text

Proposed Scheme: Preprocessing 9 Communications and Networking Lab, NTHU o Average the CPU usage and packet rate o Since we sample 5 seconds CPU usage NF status o Combine instructions and memory record to pairs o The instructions with different memory access size are considered as different features o Such as: 𝑚𝑜𝑣_2, 𝑚𝑜𝑣_6, 𝑎𝑑𝑑_4 o We give them with notation “𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛” or “𝑜𝑝”

Slide 10

Slide 10 text

Proposed Scheme: Training 10 Communications and Networking Lab, NTHU o Take 4 machine learning methods for comparison o Linear Regression o Decision Tree o AdaBoost o Gradient Boosting o Training data format o x = [𝑝𝑘𝑡_𝑟𝑎𝑡𝑒, # of 𝑜𝑝_1, # of 𝑜𝑝_2, ..., # of 𝑜𝑝_𝑛] o y = [𝑐𝑝𝑢_𝑢𝑠𝑎𝑔𝑒] x = [4230000, 5, 3, 2] y = [65] 𝑎𝑑𝑑_4 𝑚𝑜𝑣_2 𝑚𝑜𝑣_6 𝑝𝑘𝑡_𝑟𝑎𝑡𝑒

Slide 11

Slide 11 text

Implementation: Overview 11 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 12

Slide 12 text

Implementation: Overview 12 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 13

Slide 13 text

Implementation: OpenNetVM as Framework 13 Communications and Networking Lab, NTHU o We adopt OpenNetVM as our NFV framework o Which is also adopted by many academic researchers o It’s lightweight, fast, complete and uses pipeline architecture o Proposed in NSDI 2014

Slide 14

Slide 14 text

Implementation: OpenNetVM Defect 14 Communications and Networking Lab, NTHU o However, we observed the unstable & inconsistent NF performance o Since no matter how many packets are in the arriving batch, manager flushed the queue Batch Size: 2 Batch Dequeue Process Enqueue Process PKT PKT Dequeue Process Enqueue Process PKT PKT Batch Batch Size: 4 Batch Dequeue Process Enqueue Process PKT PKT PKT PKT time

Slide 15

Slide 15 text

Implementation: OpenNetVM Defect 15 Communications and Networking Lab, NTHU Original flowchart of OpenNetVM

Slide 16

Slide 16 text

Implementation: Improve OpenNetVM 16 Communications and Networking Lab, NTHU o We propose a workaround which leverages the input packet rate: o Slow packet input rate ⇨ fast counter increment rate o Fast packet input rate ⇨ slow counter increment rate o It makes sure the batch is as full as possible o The increased latency impact for every packet is < 0.01 ms when threshold is 1000 o Performance improvement ~ 20%

Slide 17

Slide 17 text

Implementation: Improve OpenNetVM 17 Communications and Networking Lab, NTHU Original Improved flowchart of OpenNetVM

Slide 18

Slide 18 text

Implementation: Overview 18 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 19

Slide 19 text

Implementation: Measurement Tracer 19 Communications and Networking Lab, NTHU o We trace the packet_handler function during NF run time o Exports the executed instruction & memory access for each thread separately o Use DynamoRIO library o Works in AMD64, AArch64 Linux o Can trace almost every executable o About 650 lines of code (LoC) in C

Slide 20

Slide 20 text

Implementation: Measurement Tracer 20 Communications and Networking Lab, NTHU

Slide 21

Slide 21 text

Implementation: Overview 21 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 22

Slide 22 text

Implementation: Redundant Code for Variety 22 Communications and Networking Lab, NTHU o To address the lack of training data o Add redundant code to NFs with several 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 o It increases the variety of NF o We apply it to Firewall & Forwarder

Slide 23

Slide 23 text

Implementation: Overview 23 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 24

Slide 24 text

Implementation: Machine Learning Parameters 24 Communications and Networking Lab, NTHU o Models: 1. LinearRegression() 2. DecisionTreeRegressor(max_depth=50) 3. AdaBoostRegressor( DecisionTreeRegressor(max_depth=40), n_estimators=300, random_state=np.random.RandomState(1) ) 4. GradientBoostingRegressor( n_estimators=300, max_depth=40, min_samples_split=3, learning_rate=0.1, loss=‘lad’ )

Slide 25

Slide 25 text

Implementation: Overview 25 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 26

Slide 26 text

Implementation: Operations for Training 26 Communications and Networking Lab, NTHU o Only 50 operations are taken as training features (Forwarder’s ∪ Firewall’s) o Even though our testing data have more operations than this table, but it’s enough

Slide 27

Slide 27 text

Implementation: Overview 27 Communications and Networking Lab, NTHU 1. OpenNetVM introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Slide 28

Slide 28 text

Implementation: NFs Topology 28 Communications and Networking Lab, NTHU o 6 topologies of training & testing VNFs o 3 NFs for training, 3 NFs for testing Training Testing In Forwarder Firewall Forwarder Busy Forwarder Drop In Forwarder Firewall Forwarder AES Encryption Drop In Forwarder Firewall Forwarder AES Decryption Drop In Forwarder Firewall Forwarder Payload Scanner Drop In Forwarder Forwarder Busy Firewall Drop Forwarder In Forwarder Firewall Forwarder Flow Tracker Drop

Slide 29

Slide 29 text

Environment: Servers Spec & NIC Rate Conversion Table 29 Communications and Networking Lab, NTHU o Device under test (DUT): o CPU: Intel Core i9-7920X 2.90 GHz 12C24T o Memory: 64 GB o Motherboard: ASUS ROG RAMPAGE VI EXTREME o NIC: Aquantia AQtion AQC107 10Gbps onboard o NFV MANO: OpenNetVM v20.10 o OS: Ubuntu 20.04 LTS o Traffic Generator (TG): o CPU: Intel Core i5-8400 2.80 GHz 6C6T o Memory: 24 GB o Motherboard: Dell Inspiron 3670 (0H4VK7) o NIC: ASUS XG-C100C (chip: Aquantia AQtion AQC107 10Gbps) o Packet generator: Pktgen-DPDK v20.03.0 o OS: Ubuntu 20.04 LTS o DUT & TG have back-to-back connection o Packet size: 64 Bytes o Packet protocol: UDP

Slide 30

Slide 30 text

Environment: DUT Optimization 30 Communications and Networking Lab, NTHU o Disable Hyper-Threading o Isolate 10 CPU cores for each NF o Disable GUI to prevent interrupt cost Enable GUI Disable GUI

Slide 31

Slide 31 text

Measured Result: CPU Usage of Forwarder & Firewall 31 Communications and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 32 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%

Slide 32

Slide 32 text

Measured Result: CPU Usage of Forwarder & Firewall 32 Communications and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 8 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%

Slide 33

Slide 33 text

Measured Result: CPU Usage of AES, Payload Scanner & Flow Tracker 33 Communications and Networking Lab, NTHU NIC rate (%) NF @ batch size 5% 10% 15% 20% 25% 30% Payload Scan @ 32 20 39 45 45 45 47 Flow Tracker @ 32 30 38 53 64 68 81 Payload Scan @ 8 25 30 48 50 50 54 Flow Tracker @ 8 31 43 54 70 74 83 NIC rate (%) NF @ batch size 1% 2% 3% 4% 5% AES Encryption @ 32 51 53 54 73 85 AES Decryption @ 32 51 53 53 73 85 AES Encryption @ 8 50 61 64 66 82 AES Decryption @ 8 52 64 64 66 86 o AES compute is more complicated, NIC rate is lower

Slide 34

Slide 34 text

Evaluation: Prediction Error of AES Decryption 34 Communications and Networking Lab, NTHU o AES Decryption is similar to AES Encryption, so Mean Absolute Error (MAE) is very low o Without AES Encryption in training data, MAE raises dramatically high NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 59.66 61.27 62.89 64.49 66.13 10.74 18.87 8.66 Decision tree (DT) 51 53 54 73 85 0.2 1 0 AdaBoost (AB) 54 54 54 73 85 1 3 1 Gradient Boosting (GB) 51 52.38 54 73 85 0.32 1 0 Real 51 53 53 73 85 Batch size: 32 NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 60.24 62.25 64.27 66.29 68.35 5.64 17.65 1.75 Decision tree (DT) 50 61 64 66 82 1.8 4 2 AdaBoost (AB) 50 61 61 66 82 2.4 4 3 Gradient Boosting (GB) 50 60.97 64 66 82 1.81 4 2 Real 52 64 64 66 86 Batch size: 8

Slide 35

Slide 35 text

Evaluation: Prediction Error of Payload Scanner 35 Communications and Networking Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -7.6 2.36 11.87 21.18 30.31 39.44 26.57 36.13 28.23 Decision tree (DT) 27 27 27 38 38 55 8.5 21 7.5 AdaBoost (AB) 36 36 36 38 40 57 9 12 10.5 Gradient Boosting (GB) 36.76 36.76 36.76 38.2 42.97 56.05 8.44 11.8 9.13 Real 25 30 48 50 50 54 Batch size: 8 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -6.33 1.54 9.08 16.48 23.73 30.98 27.59 37.46 27.42 Decision tree (DT) 22 22 22 24 27 27 16.83 23 19 AdaBoost (AB) 30 30 30 30 33 33 12.5 15 13 Gradient Boosting (GB) 29.25 29.25 29.15 28.84 35.98 35.98 11.84 16.16 10.39 Real 20 39 45 45 45 47 Batch size: 32

Slide 36

Slide 36 text

Evaluation: Prediction Error of Flow Tracker 36 Communications and Networking Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 12.57 20.5 28.08 35.66 42.76 50.04 24.07 30.96 25.08 Decision tree (DT) 22 22 22 30 30 57 25.17 38 27.5 AdaBoost (AB) 38 38 38 44 44 57 15.17 24 17.5 Gradient Boosting (GB) 41.22 41.22 41.22 40.09 49.52 56.63 15.5 24.37 15.13 Real 30 38 53 64 68 81 Batch size: 32 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 9.61 19.57 29.07 38.61 47.16 56.12 25.81 31.39 25.88 Decision tree (DT) 28 28 28 41 41 41 24.67 42 27.5 AdaBoost (AB) 41 41 41 41 57 57 16.17 29 15 Gradient Boosting (GB) 43.19 43.19 43.19 46.93 56.11 56.11 15.18 26.89 15.04 Real 31 43 54 70 74 83 Batch size: 8

Slide 37

Slide 37 text

Conclusion 37 Communications and Networking Lab, NTHU o Objective: o Given throughput (packet per second), predict the CPU usage o Proposed scheme: o Measure the instruction of critical function of NF o Then use machine learning methods to predict CPU usage o Prediction result: o If there exists similar NF in training data, prediction is pretty accurate o Even though the NFs have never been trained, MAEs are all < 16