Save 37% off PRO during our Black Friday Sale! »

清大資工所碩論口試 -- NFV Performance Prediction Based on Run-Time Instruction Analysis

清大資工所碩論口試 -- NFV Performance Prediction Based on Run-Time Instruction Analysis

2021.07.28 清大資工所碩論口試
Proposal Defense of CS master degree in NTHU

9fa56d41ed10a6ad67ff80c9e7626eb3?s=128

JackKuo

July 29, 2021
Tweet

Transcript

  1. NFV Performance Prediction Based on Run-Time Instruction Analysis Speaker :

    Chun-Fu Kuo Advisor: Jung-Chun Kao Date : 2021/07/28 1 Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Communications and Networking Lab, NTHU
  2. Outline o Introduction o Problem Statement o Proposed Scheme o

    Implementation o Environment o Measured Result o Evaluation o Conclusion 2 Communications and Networking Lab, NTHU
  3. Introduction: Network Function Virtualization (NFV) Architecture 3 Communications and Networking

    Lab, NTHU
  4. Introduction: Run-to-complete vs Pipeline Architecture 4 Communications and Networking Lab,

    NTHU
  5. Problem Statement 5 Communications and Networking Lab, NTHU o Rationale

    for performance prediction o Operators: how many servers should I have? o Vendors: how is the performance regarding my NF in different servers? o Consideration o The throughput of NFs varies from different CPU, NIC, RAM, etc. o The growth of throughput against the CPU usage is not linear o Contention happens in cache, memory, NIC o CPU manufactories have their own proprietary techniques which are not presented in white paper o Objective: o Given packet arrival rate, predict the CPU usage
  6. Proposed Scheme: Overview 6 Communications and Networking Lab, NTHU Measurement

    Preprocessing Training 1. Improve NFV Framework 2. Improve Measurement Precision 3. Trace Run-Time Instruction 4. Get CPU Usage / Packet Rate Measurement Preprocessing 1. Choose Features and Format 2. Average Metrics 1. Choose Machine Learning Algorithms 2. Improve the Lack of Training Data Training
  7. Proposed Scheme: Packet Processing Timeline 7 Communications and Networking Lab,

    NTHU Batch Process Dequeue Process Enqueue Process PKT PKT PKT PKT PKT PKT time o NFs process packets in batch to make full use of cache and reduce overhead o Each NF has almost the same dequeue process and enqueue process o NF process flows differ from their type and configuration o So, we only have to measure the different part: packet_handler function in NFs
  8. Proposed Scheme: Measurement Method 8 Communications and Networking Lab, NTHU

    o The performance (CPU usage) of NF depends on o Packet arrival rate o Executed instructions o Accessed memory size o Use DynamoRIO to measure instructions and memory in run time o A instrumentation tool developed by HP Labs & MIT o Free and open source o Cross architecture: IA-32, AMD64, ARM, AArch64 o Cross platform: Linux, Windows, macOS
  9. Proposed Scheme: Preprocessing 9 Communications and Networking Lab, NTHU o

    Average the CPU usage and packet rate o Since we sample 5 seconds CPU usage NF status o Combine instructions and memory record to pairs o The instructions with different memory access size are considered as different features o Such as: 𝑚𝑜𝑣_2, 𝑚𝑜𝑣_6, 𝑎𝑑𝑑_4 o We give them with notation “𝑜𝑝𝑒𝑟𝑎𝑡𝑖𝑜𝑛” or “𝑜𝑝”
  10. Proposed Scheme: Training 10 Communications and Networking Lab, NTHU o

    Take 4 machine learning methods for comparison o Linear Regression o Decision Tree o AdaBoost o Gradient Boosting o Training data format o x = [𝑝𝑘𝑡_𝑟𝑎𝑡𝑒, # of 𝑜𝑝_1, # of 𝑜𝑝_2, ..., # of 𝑜𝑝_𝑛] o y = [𝑐𝑝𝑢_𝑢𝑠𝑎𝑔𝑒] x = [4230000, 5, 3, 2] y = [65] 𝑎𝑑𝑑_4 𝑚𝑜𝑣_2 𝑚𝑜𝑣_6 𝑝𝑘𝑡_𝑟𝑎𝑡𝑒
  11. Implementation: Overview 11 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  12. Implementation: Overview 12 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  13. Implementation: OpenNetVM as Framework 13 Communications and Networking Lab, NTHU

    o We adopt OpenNetVM as our NFV framework o Which is also adopted by many academic researchers o It’s lightweight, fast, complete and uses pipeline architecture o Proposed in NSDI 2014
  14. Implementation: OpenNetVM Defect 14 Communications and Networking Lab, NTHU o

    However, we observed the unstable & inconsistent NF performance o Since no matter how many packets are in the arriving batch, manager flushed the queue Batch Size: 2 Batch Dequeue Process Enqueue Process PKT PKT Dequeue Process Enqueue Process PKT PKT Batch Batch Size: 4 Batch Dequeue Process Enqueue Process PKT PKT PKT PKT time
  15. Implementation: OpenNetVM Defect 15 Communications and Networking Lab, NTHU Original

    flowchart of OpenNetVM
  16. Implementation: Improve OpenNetVM 16 Communications and Networking Lab, NTHU o

    We propose a workaround which leverages the input packet rate: o Slow packet input rate ⇨ fast counter increment rate o Fast packet input rate ⇨ slow counter increment rate o It makes sure the batch is as full as possible o The increased latency impact for every packet is < 0.01 ms when threshold is 1000 o Performance improvement ~ 20%
  17. Implementation: Improve OpenNetVM 17 Communications and Networking Lab, NTHU Original

    Improved flowchart of OpenNetVM
  18. Implementation: Overview 18 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  19. Implementation: Measurement Tracer 19 Communications and Networking Lab, NTHU o

    We trace the packet_handler function during NF run time o Exports the executed instruction & memory access for each thread separately o Use DynamoRIO library o Works in AMD64, AArch64 Linux o Can trace almost every executable o About 650 lines of code (LoC) in C
  20. Implementation: Measurement Tracer 20 Communications and Networking Lab, NTHU

  21. Implementation: Overview 21 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  22. Implementation: Redundant Code for Variety 22 Communications and Networking Lab,

    NTHU o To address the lack of training data o Add redundant code to NFs with several 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 o It increases the variety of NF o We apply it to Firewall & Forwarder
  23. Implementation: Overview 23 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  24. Implementation: Machine Learning Parameters 24 Communications and Networking Lab, NTHU

    o Models: 1. LinearRegression() 2. DecisionTreeRegressor(max_depth=50) 3. AdaBoostRegressor( DecisionTreeRegressor(max_depth=40), n_estimators=300, random_state=np.random.RandomState(1) ) 4. GradientBoostingRegressor( n_estimators=300, max_depth=40, min_samples_split=3, learning_rate=0.1, loss=‘lad’ )
  25. Implementation: Overview 25 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  26. Implementation: Operations for Training 26 Communications and Networking Lab, NTHU

    o Only 50 operations are taken as training features (Forwarder’s ∪ Firewall’s) o Even though our testing data have more operations than this table, but it’s enough
  27. Implementation: Overview 27 Communications and Networking Lab, NTHU 1. OpenNetVM

    introduction & our improvement 2. Measurement tracer details 3. Improvement for NF paucity 4. Machine learning parameters 5. Operations (instruction & memory size pair) we take 6. Topology of VNFs
  28. Implementation: NFs Topology 28 Communications and Networking Lab, NTHU o

    6 topologies of training & testing VNFs o 3 NFs for training, 3 NFs for testing Training Testing In Forwarder Firewall Forwarder Busy Forwarder Drop In Forwarder Firewall Forwarder AES Encryption Drop In Forwarder Firewall Forwarder AES Decryption Drop In Forwarder Firewall Forwarder Payload Scanner Drop In Forwarder Forwarder Busy Firewall Drop Forwarder In Forwarder Firewall Forwarder Flow Tracker Drop
  29. Environment: Servers Spec & NIC Rate Conversion Table 29 Communications

    and Networking Lab, NTHU o Device under test (DUT): o CPU: Intel Core i9-7920X 2.90 GHz 12C24T o Memory: 64 GB o Motherboard: ASUS ROG RAMPAGE VI EXTREME o NIC: Aquantia AQtion AQC107 10Gbps onboard o NFV MANO: OpenNetVM v20.10 o OS: Ubuntu 20.04 LTS o Traffic Generator (TG): o CPU: Intel Core i5-8400 2.80 GHz 6C6T o Memory: 24 GB o Motherboard: Dell Inspiron 3670 (0H4VK7) o NIC: ASUS XG-C100C (chip: Aquantia AQtion AQC107 10Gbps) o Packet generator: Pktgen-DPDK v20.03.0 o OS: Ubuntu 20.04 LTS o DUT & TG have back-to-back connection o Packet size: 64 Bytes o Packet protocol: UDP
  30. Environment: DUT Optimization 30 Communications and Networking Lab, NTHU o

    Disable Hyper-Threading o Isolate 10 CPU cores for each NF o Disable GUI to prevent interrupt cost Enable GUI Disable GUI
  31. Measured Result: CPU Usage of Forwarder & Firewall 31 Communications

    and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 32 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%
  32. Measured Result: CPU Usage of Forwarder & Firewall 32 Communications

    and Networking Lab, NTHU o CPU usage of Forwarder & Firewall with batch size 8 o It is not linear as 𝑏𝑢𝑠𝑦_𝑡𝑖𝑚𝑒 increases 0 20 40 60 80 100 120 0 1 2 3 4 5 6 CPU Usage (%) Busy Time Forward @ 30% Forward @ 18% Firewall @ 30% Firewall @ 18%
  33. Measured Result: CPU Usage of AES, Payload Scanner & Flow

    Tracker 33 Communications and Networking Lab, NTHU NIC rate (%) NF @ batch size 5% 10% 15% 20% 25% 30% Payload Scan @ 32 20 39 45 45 45 47 Flow Tracker @ 32 30 38 53 64 68 81 Payload Scan @ 8 25 30 48 50 50 54 Flow Tracker @ 8 31 43 54 70 74 83 NIC rate (%) NF @ batch size 1% 2% 3% 4% 5% AES Encryption @ 32 51 53 54 73 85 AES Decryption @ 32 51 53 53 73 85 AES Encryption @ 8 50 61 64 66 82 AES Decryption @ 8 52 64 64 66 86 o AES compute is more complicated, NIC rate is lower
  34. Evaluation: Prediction Error of AES Decryption 34 Communications and Networking

    Lab, NTHU o AES Decryption is similar to AES Encryption, so Mean Absolute Error (MAE) is very low o Without AES Encryption in training data, MAE raises dramatically high NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 59.66 61.27 62.89 64.49 66.13 10.74 18.87 8.66 Decision tree (DT) 51 53 54 73 85 0.2 1 0 AdaBoost (AB) 54 54 54 73 85 1 3 1 Gradient Boosting (GB) 51 52.38 54 73 85 0.32 1 0 Real 51 53 53 73 85 Batch size: 32 NIC rate (%) ML algo 1% 2% 3% 4% 5% MAE MAX Median Linear regression (LR) 60.24 62.25 64.27 66.29 68.35 5.64 17.65 1.75 Decision tree (DT) 50 61 64 66 82 1.8 4 2 AdaBoost (AB) 50 61 61 66 82 2.4 4 3 Gradient Boosting (GB) 50 60.97 64 66 82 1.81 4 2 Real 52 64 64 66 86 Batch size: 8
  35. Evaluation: Prediction Error of Payload Scanner 35 Communications and Networking

    Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -7.6 2.36 11.87 21.18 30.31 39.44 26.57 36.13 28.23 Decision tree (DT) 27 27 27 38 38 55 8.5 21 7.5 AdaBoost (AB) 36 36 36 38 40 57 9 12 10.5 Gradient Boosting (GB) 36.76 36.76 36.76 38.2 42.97 56.05 8.44 11.8 9.13 Real 25 30 48 50 50 54 Batch size: 8 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) -6.33 1.54 9.08 16.48 23.73 30.98 27.59 37.46 27.42 Decision tree (DT) 22 22 22 24 27 27 16.83 23 19 AdaBoost (AB) 30 30 30 30 33 33 12.5 15 13 Gradient Boosting (GB) 29.25 29.25 29.15 28.84 35.98 35.98 11.84 16.16 10.39 Real 20 39 45 45 45 47 Batch size: 32
  36. Evaluation: Prediction Error of Flow Tracker 36 Communications and Networking

    Lab, NTHU NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 12.57 20.5 28.08 35.66 42.76 50.04 24.07 30.96 25.08 Decision tree (DT) 22 22 22 30 30 57 25.17 38 27.5 AdaBoost (AB) 38 38 38 44 44 57 15.17 24 17.5 Gradient Boosting (GB) 41.22 41.22 41.22 40.09 49.52 56.63 15.5 24.37 15.13 Real 30 38 53 64 68 81 Batch size: 32 NIC rate (%) ML algo 5% 10% 15% 20% 25% 30% MAE MAX Median Linear regression (LR) 9.61 19.57 29.07 38.61 47.16 56.12 25.81 31.39 25.88 Decision tree (DT) 28 28 28 41 41 41 24.67 42 27.5 AdaBoost (AB) 41 41 41 41 57 57 16.17 29 15 Gradient Boosting (GB) 43.19 43.19 43.19 46.93 56.11 56.11 15.18 26.89 15.04 Real 31 43 54 70 74 83 Batch size: 8
  37. Conclusion 37 Communications and Networking Lab, NTHU o Objective: o

    Given throughput (packet per second), predict the CPU usage o Proposed scheme: o Measure the instruction of critical function of NF o Then use machine learning methods to predict CPU usage o Prediction result: o If there exists similar NF in training data, prediction is pretty accurate o Even though the NFs have never been trained, MAEs are all < 16