清大資工所碩論口試 -- NFV Performance Prediction Based on Run-Time Instruction Analysis

NFV Performance Prediction Based on Run-Time Instruction Analysis Speaker :
Chun-Fu Kuo Advisor: Jung-Chun Kao Date : 2021/07/28 1 Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Communications and Networking Lab, NTHU

Outline o Introduction o Problem Statement o Proposed Scheme o
Implementation o Environment o Measured Result o Evaluation o Conclusion 2 Communications and Networking Lab, NTHU

Introduction: Network Function Virtualization (NFV) Architecture 3 Communications and Networking
Lab, NTHU

Introduction: Run-to-complete vs Pipeline Architecture 4 Communications and Networking Lab,
NTHU

Problem Statement 5 Communications and Networking Lab, NTHU o Rationale
for performance prediction o Operators: how many servers should I have? o Vendors: how is the performance regarding my NF in different servers? o Consideration o The throughput of NFs varies from different CPU, NIC, RAM, etc. o The growth of throughput against the CPU usage is not linear o Contention happens in cache, memory, NIC o CPU manufactories have their own proprietary techniques which are not presented in white paper o Objective: o Given packet arrival rate, predict the CPU usage

Proposed Scheme: Overview 6 Communications and Networking Lab, NTHU Measurement
Preprocessing Training 1. Improve NFV Framework 2. Improve Measurement Precision 3. Trace Run-Time Instruction 4. Get CPU Usage / Packet Rate Measurement Preprocessing 1. Choose Features and Format 2. Average Metrics 1. Choose Machine Learning Algorithms 2. Improve the Lack of Training Data Training

Proposed Scheme: Packet Processing Timeline 7 Communications and Networking Lab,
NTHU Batch Process Dequeue Process Enqueue Process PKT PKT PKT PKT PKT PKT time o NFs process packets in batch to make full use of cache and reduce overhead o Each NF has almost the same dequeue process and enqueue process o NF process flows differ from their type and configuration o So, we only have to measure the different part: packet_handler function in NFs

Proposed Scheme: Measurement Method 8 Communications and Networking Lab, NTHU
o The performance (CPU usage) of NF depends on o Packet arrival rate o Executed instructions o Accessed memory size o Use DynamoRIO to measure instructions and memory in run time o A instrumentation tool developed by HP Labs & MIT o Free and open source o Cross architecture: IA-32, AMD64, ARM, AArch64 o Cross platform: Linux, Windows, macOS

Proposed Scheme: Preprocessing 9 Communications and Networking Lab, NTHU o
Average the CPU usage and packet rate o Since we sample 5 seconds CPU usage NF status o Combine instructions and memory record to pairs o The instructions with different memory access size are considered as different features o Such as: 𝑚𝑜𝑣_2, 𝑚𝑜𝑣_6, 𝑎𝑑𝑑_4 o We give them with notation “𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛” or “𝑜𝑝”

Proposed Scheme: Training 10 Communications and Networking Lab, NTHU o
Take 4 machine learning methods for comparison o Linear Regression o Decision Tree o AdaBoost o Gradient Boosting o Training data format o x = [𝑝𝑘𝑡_𝑟𝑎𝑡𝑒, # of 𝑜𝑝_1, # of 𝑜𝑝_2, ..., # of 𝑜𝑝_𝑛] o y = [𝑐𝑝𝑢_𝑢𝑠𝑎𝑔𝑒] x = [4230000, 5, 3, 2] y = [65] 𝑎𝑑𝑑_4 𝑚𝑜𝑣_2 𝑚𝑜𝑣_6 𝑝𝑘𝑡_𝑟𝑎𝑡𝑒

Implementation: Overview 11 Communications and Networking Lab, NTHU 1. OpenNetVM
introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs

Implementation: OpenNetVM as Framework 13 Communications and Networking Lab, NTHU
o We adopt OpenNetVM as our NFV framework o Which is also adopted by many academic researchers o It’s lightweight, fast, complete and uses pipeline architecture o Proposed in NSDI 2014

Implementation: OpenNetVM Defect 14 Communications and Networking Lab, NTHU o
However, we observed the unstable & inconsistent NF performance o Since no matter how many packets are in the arriving batch, manager flushed the queue Batch Size: 2 Batch Dequeue Process Enqueue Process PKT PKT Dequeue Process Enqueue Process PKT PKT Batch Batch Size: 4 Batch Dequeue Process Enqueue Process PKT PKT PKT PKT time

Implementation: OpenNetVM Defect 15 Communications and Networking Lab, NTHU Original
flowchart of OpenNetVM

Implementation: Improve OpenNetVM 16 Communications and Networking Lab, NTHU o
We propose a workaround which leverages the input packet rate: o Slow packet input rate ⇨ fast counter increment rate o Fast packet input rate ⇨ slow counter increment rate o It makes sure the batch is as full as possible o The increased latency impact for every packet is < 0.01 ms when threshold is 1000 o Performance improvement ~ 20%

Implementation: Improve OpenNetVM 17 Communications and Networking Lab, NTHU Original
Improved flowchart of OpenNetVM

Implementation: Measurement Tracer 19 Communications and Networking Lab, NTHU o
We trace the packet_handler function during NF run time o Exports the executed instruction & memory access for each thread separately o Use DynamoRIO library o Works in AMD64, AArch64 Linux o Can trace almost every executable o About 650 lines of code (LoC) in C

Implementation: Measurement Tracer 20 Communications and Networking Lab, NTHU

Implementation: Redundant Code for Variety 22 Communications and Networking Lab,
NTHU o To address the lack of training data o Add redundant code to NFs with several 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 o It increases the variety of NF o We apply it to Firewall & Forwarder

Implementation: Machine Learning Parameters 24 Communications and Networking Lab, NTHU
o Models: 1. LinearRegression() 2. DecisionTreeRegressor(max_depth=50) 3. AdaBoostRegressor( DecisionTreeRegressor(max_depth=40), n_estimators=300, random_state=np.random.RandomState(1) ) 4. GradientBoostingRegressor( n_estimators=300, max_depth=40, min_samples_split=3, learning_rate=0.1, loss=‘lad’ )

Implementation: Operations for Training 26 Communications and Networking Lab, NTHU
o Only 50 operations are taken as training features (Forwarder’s ∪ Firewall’s) o Even though our testing data have more operations than this table, but it’s enough

Implementation: NFs Topology 28 Communications and Networking Lab, NTHU o
6 topologies of training & testing VNFs o 3 NFs for training, 3 NFs for testing Training Testing In Forwarder Firewall Forwarder Busy Forwarder Drop In Forwarder Firewall Forwarder AES Encryption Drop In Forwarder Firewall Forwarder AES Decryption Drop In Forwarder Firewall Forwarder Payload Scanner Drop In Forwarder Forwarder Busy Firewall Drop Forwarder In Forwarder Firewall Forwarder Flow Tracker Drop

Environment: Servers Spec & NIC Rate Conversion Table 29 Communications
and Networking Lab, NTHU o Device under test (DUT): o CPU: Intel Core i9-7920X 2.90 GHz 12C24T o Memory: 64 GB o Motherboard: ASUS ROG RAMPAGE VI EXTREME o NIC: Aquantia AQtion AQC107 10Gbps onboard o NFV MANO: OpenNetVM v20.10 o OS: Ubuntu 20.04 LTS o Traffic Generator (TG): o CPU: Intel Core i5-8400 2.80 GHz 6C6T o Memory: 24 GB o Motherboard: Dell Inspiron 3670 (0H4VK7) o NIC: ASUS XG-C100C (chip: Aquantia AQtion AQC107 10Gbps) o Packet generator: Pktgen-DPDK v20.03.0 o OS: Ubuntu 20.04 LTS o DUT & TG have back-to-back connection o Packet size: 64 Bytes o Packet protocol: UDP

Environment: DUT Optimization 30 Communications and Networking Lab, NTHU o
Disable Hyper-Threading o Isolate 10 CPU cores for each NF o Disable GUI to prevent interrupt cost Enable GUI Disable GUI

Measured Result: CPU Usage of Forwarder & Firewall 31 Communications
and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 32 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%

Measured Result: CPU Usage of Forwarder & Firewall 32 Communications
and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 8 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%

Measured Result: CPU Usage of AES, Payload Scanner & Flow
Tracker 33 Communications and Networking Lab, NTHU NIC rate (%) NF @ batch size 5% 10% 15% 20% 25% 30% Payload Scan @ 32 20 39 45 45 45 47 Flow Tracker @ 32 30 38 53 64 68 81 Payload Scan @ 8 25 30 48 50 50 54 Flow Tracker @ 8 31 43 54 70 74 83 NIC rate (%) NF @ batch size 1% 2% 3% 4% 5% AES Encryption @ 32 51 53 54 73 85 AES Decryption @ 32 51 53 53 73 85 AES Encryption @ 8 50 61 64 66 82 AES Decryption @ 8 52 64 64 66 86 o AES compute is more complicated, NIC rate is lower

Evaluation: Prediction Error of AES Decryption 34 Communications and Networking
Lab, NTHU o AES Decryption is similar to AES Encryption, so Mean Absolute Error (MAE) is very low o Without AES Encryption in training data, MAE raises dramatically high NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 59.66 61.27 62.89 64.49 66.13 10.74 18.87 8.66 Decision tree (DT) 51 53 54 73 85 0.2 1 0 AdaBoost (AB) 54 54 54 73 85 1 3 1 Gradient Boosting (GB) 51 52.38 54 73 85 0.32 1 0 Real 51 53 53 73 85 Batch size: 32 NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 60.24 62.25 64.27 66.29 68.35 5.64 17.65 1.75 Decision tree (DT) 50 61 64 66 82 1.8 4 2 AdaBoost (AB) 50 61 61 66 82 2.4 4 3 Gradient Boosting (GB) 50 60.97 64 66 82 1.81 4 2 Real 52 64 64 66 86 Batch size: 8

Evaluation: Prediction Error of Payload Scanner 35 Communications and Networking
Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -7.6 2.36 11.87 21.18 30.31 39.44 26.57 36.13 28.23 Decision tree (DT) 27 27 27 38 38 55 8.5 21 7.5 AdaBoost (AB) 36 36 36 38 40 57 9 12 10.5 Gradient Boosting (GB) 36.76 36.76 36.76 38.2 42.97 56.05 8.44 11.8 9.13 Real 25 30 48 50 50 54 Batch size: 8 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -6.33 1.54 9.08 16.48 23.73 30.98 27.59 37.46 27.42 Decision tree (DT) 22 22 22 24 27 27 16.83 23 19 AdaBoost (AB) 30 30 30 30 33 33 12.5 15 13 Gradient Boosting (GB) 29.25 29.25 29.15 28.84 35.98 35.98 11.84 16.16 10.39 Real 20 39 45 45 45 47 Batch size: 32

Evaluation: Prediction Error of Flow Tracker 36 Communications and Networking
Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 12.57 20.5 28.08 35.66 42.76 50.04 24.07 30.96 25.08 Decision tree (DT) 22 22 22 30 30 57 25.17 38 27.5 AdaBoost (AB) 38 38 38 44 44 57 15.17 24 17.5 Gradient Boosting (GB) 41.22 41.22 41.22 40.09 49.52 56.63 15.5 24.37 15.13 Real 30 38 53 64 68 81 Batch size: 32 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 9.61 19.57 29.07 38.61 47.16 56.12 25.81 31.39 25.88 Decision tree (DT) 28 28 28 41 41 41 24.67 42 27.5 AdaBoost (AB) 41 41 41 41 57 57 16.17 29 15 Gradient Boosting (GB) 43.19 43.19 43.19 46.93 56.11 56.11 15.18 26.89 15.04 Real 31 43 54 70 74 83 Batch size: 8

Conclusion 37 Communications and Networking Lab, NTHU o Objective: o
Given throughput (packet per second), predict the CPU usage o Proposed scheme: o Measure the instruction of critical function of NF o Then use machine learning methods to predict CPU usage o Prediction result: o If there exists similar NF in training data, prediction is pretty accurate o Even though the NFs have never been trained, MAEs are all < 16

清大資工所碩論口試 -- NFV Performance Prediction Based o...

清大資工所碩論口試 -- NFV Performance Prediction Based on Run-Time Instruction Analysis

More Decks by JackKuo

Other Decks in Education

Featured

Transcript

清大資工所碩論口試 -- NFV Performance PredictionBased o...