Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Contracts for Software Network Functions

9fa56d41ed10a6ad67ff80c9e7626eb3?s=47 JackKuo
February 18, 2021

Performance Contracts for Software Network Functions

Group meeting presentation of CANLAB in NTHU

9fa56d41ed10a6ad67ff80c9e7626eb3?s=128

JackKuo

February 18, 2021
Tweet

Transcript

  1. Communications and Networking Lab, NTHU Performance Contracts for Software Network

    Functions 1 Rishabh Iyer, Luis Pedrosa, Arseniy Zaostrovnykh, Solal Pirelli, Katerina Argyraki, and George Candea NSDI 2019 Speaker: Chun-Fu Kuo Date:2021.02.18
  2. Communications and Networking Lab, NTHU ▪ Introduction ▪ Problem Formulation

    ▪ Proposed Method ▪ Evaluation ▪ Limitations ▪ Conclusion ▪ Pros and Cons 
 2 Outline
  3. Communications and Networking Lab, NTHU ▪ The LLVM Project is

    a collection of modular and reusable compiler and toolchain technologies ▪ Famous front-end: Clang (for C, C++, Object-C, Object-C++) 3 Introduction LLVM
  4. Communications and Networking Lab, NTHU ▪ SMT problem is a

    decision problem for logical formulas ▪ Examples of theories typically used in computer science ▪ The theories of real numbers, lists, arrays, bit vector and so on ▪ For instance: ▪ ▪ ▪ Famous solvers: ▪ Z3 (Microsoft, open source) ▪ STP (Simple Theorem Prover) 3x + 2y − z ≥ 4 f( f(u, v), v) = f(u, v) 4 Introduction Satisfiability Modulo Theories (SMT)
  5. Communications and Networking Lab, NTHU ▪ A cross-platform SMT solver

    by Microsoft ▪ Usage: 5 Introduction Z3 { a + b = 20 a + 2b = 10 ¬(a ∧ b ) ≡ (¬ a ∨ ¬ b)
  6. Communications and Networking Lab, NTHU ▪ Use SMT to prove

    the correctness of programs (software testing) ▪ A.k.a symbolic evaluation or symbex 6 Introduction Symbolic Execution (SE) Input:
  7. Communications and Networking Lab, NTHU ▪ Example of KLEE (symbolic

    execution engine) which analyzes LLVM bitcode 7 Introduction Symbolic Execution (SE) int get_sign( int x ) { if (x == 0) return 0; if (x < 0) return -1; else return 1; } int main() { int a ; klee_make_symbolic(
 &a, sizeof(a), "a " ) ; return get_sign(a) ; KLEE: output directory = "klee-out-0" KLEE: done: total instructions = 3 1 KLEE: done: completed paths = 3 KLEE: done: generated tests = 3 Symbolic Execution Engine object 0: int : 0 object 0: int : 16843009 object 0: int : -2147483648
  8. Communications and Networking Lab, NTHU ▪ Instrumentation is performed at

    run time on the compiled binary files ▪ Use JIT engine to ▪ Analyze and label executable ▪ Insert customized code at runtime ▪ Famous tools: ▪ Intel Pin (support IA32, IA64 only, easy to use) ▪ DynamoRIO (support IA-32/AMD64/ARM/AArch64, complicated) ▪ Frida (mostly for Android hacking) 8 Introduction Dynamic Binary Instrumentation (DBI)
  9. Communications and Networking Lab, NTHU ▪ Estimate the cost for

    deployment ▪ Estimate the performance after change the configuration ▪ Estimate the risk when suffer adversarial workload 9 Problem Formulation
  10. Communications and Networking Lab, NTHU ▪ Bolt ▪ Use performance

    contract to predict the NF performance ▪ Performance Critical Variables (PCV) depicts the contract ▪ Use symbolic execution to find out all potential paths in NF ▪ A bunch of pre-analysis library of stateful NF data structure ▪ Bolt Distiller ▪ Find out which execution paths are common in real world 10 System Model
  11. Communications and Networking Lab, NTHU ▪ A LPM router ▪

    PCV is ▪ The length of IP address ▪ (This example ignores all layers 
 below the NF code) l 11 Proposed Method - Performance Contract for LPM Router Longest Prefix Matching e.g. 140.114.111.222 / 16 NIC port
  12. Communications and Networking Lab, NTHU 12 Proposed Method - Performance

    Contract for MAC Bridge learn source MAC 
 from which port find the matching 
 out port
  13. Communications and Networking Lab, NTHU 13 Proposed Method - Performance

    Contract for MAC Bridge Drop(p) return MACtable_put() FORWARD() MACtable_get() BROADCAST() key present C: number of hash collision
  14. Communications and Networking Lab, NTHU 14 Proposed Method - Contract

    Analysis Prototype
  15. Communications and Networking Lab, NTHU 15 Proposed Method - Requirement

    of Generating Contract ▪ Requirements ▪ Well defined separation between stateful and stateless NF code ▪ Pre-analysis library for stateful NF data structure ▪ Analyze once, reuse across NFs ▪ Appropriate PCV is the balance of precision & difficulty of use ▪ More PCVs could leak the implementation detail of the NF ▪ Developers need more detail ▪ Operators need a easy analysis approach
  16. Communications and Networking Lab, NTHU 16 Proposed Method - Generating

    Contract
  17. Communications and Networking Lab, NTHU 17 Proposed Method - Generating

    Contract
  18. Communications and Networking Lab, NTHU 18 Proposed Method - Generating

    Contract
  19. Communications and Networking Lab, NTHU 19 Proposed Method - Constraints

    for NF Chains
  20. Communications and Networking Lab, NTHU 20 Proposed Method - Constraints

    for NF Chains ▪ Scenario ▪ Firewall drop all packets with IP options ▪ Router no longer receives the packets with IP options ▪ How to improve the contract correctness? ▪ Generate performance contracts for individual NFs in chain ▪ Pair together traffic classes from communicating NFs ▪ For each pair - AND respective constraints together
  21. Communications and Networking Lab, NTHU 21 Proposed Method - Implementation

    Details ▪ Instruction replay ▪ Use instrumentation to log instructions & memory locations (access) ▪ Disable any link-time-optimizations ▪ Make BOLT always gets the worst performance result ▪ Hardware model employed ▪ Compute instructions: Follow Intel manual & adopt the worst case performance (due to out-of-order instruction scheduling) ▪ Memory instructions: only model the private L1 Data Caches 
 (never model the proprietary features: prefetching, parallelism)
  22. Communications and Networking Lab, NTHU 22 Proposed Method - Implementation

    Details
  23. Communications and Networking Lab, NTHU 23 Proposed Method - Bolt

    Distiller Why? ▪ There are several hundred execution paths for each NF ▪ But only some of them are usually triggered in real world How? ▪ Input: 1. The real-world traffic (PCAP file) 2. Stateless NF code, slightly modified version of data structure 
 (trace the # of loop iterations, logging the matched prefix length) Result ▪ The match length of LPM router mostly are 16~24 bits
  24. Communications and Networking Lab, NTHU 24 Proposed Method - Bolt

    Distiller ▪ For operator ▪ Distiller can be used to balance risk with resource utilization ▪ For developer ▪ Distiller can help them know which assumption is wrong, so they can optimize the code 📝 Note: Distiller doesn’t change the contract, only tells the user which execution paths are more common
  25. Communications and Networking Lab, NTHU 25 Evaluation ▪ Environment ▪

    CPU: E5-2667 ▪ RAM: 32 GB ▪ NIC: Intel 82599ES 10 Gb (directly connected) ▪ Traffic generator: MoonGen ▪ NFs ▪ Br:MAC bridge ▪ LPM:LPM router (use DPDK data structure) ▪ NAT:a formally verified NAT ▪ LB:Maglev like load balancer
  26. Communications and Networking Lab, NTHU 26 Evaluation ▪ Bolt generate

    contracts from many possible execution path ▪ Metrics: ▪ # of executed instruction ▪ # of memory accesses ▪ # of execution cycles
  27. Communications and Networking Lab, NTHU 27 Evaluation - Hardware-Independent Metrics

    ▪ Over-estimation is 7.5% ▪ Coalesce execution paths within the stateful performance contract ▪ Small differences between the analyzed code & production build
  28. Communications and Networking Lab, NTHU 28 Evaluation - Hardware-Dependent Metrics

    ▪ 4.08X for typical workload, 9.26X for the pathological (unconstrained) ▪ It can be improved via better hardware model
  29. Communications and Networking Lab, NTHU 29 Evaluation - Hardware-Dependent Metrics

    ▪ To validate the hardware model hypothesis, here is a simple experiment ▪ P1: traverse a non-contiguously allocated linked list ▪ ❌ MLP (Memory Level Parallelism), ❌ prefetching ▪ Error in 5% ▪ P2: traverse a linked list in a contiguous chunk of memory ▪ ❌ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 6X ▪ P3: traverse a array ▪ ✅ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 9X
  30. Communications and Networking Lab, NTHU 30 Limitations ▪ Requirements: ▪

    Separation on stateful/stateless code ▪ Pre-analysis library ▪ Doesn’t support multi-threaded NFs with shared state ▪ Doesn’t consider the contention about cache & memory ▪ NF performs system call or share CPU core cannot be analyzed accurately ▪ Need the source code of NF to analyze 
 (more convenient for PCV selection)
  31. Communications and Networking Lab, NTHU ▪ Goal ▪ Predict the

    performance of NF without executing it ▪ Method ▪ Use performance critical variables to describe the NF ▪ Symbolic execution can help find all the potential paths ▪ Use Bolt Distiller to know which paths is much more common in real world ▪ Result ▪ Predict the NF performance with error rate < 8% 31 Conclusion
  32. Communications and Networking Lab, NTHU ▪ Pros ▪ PCVs are

    flexible for analyzing NF performance ▪ Pre-analysis library can make prediction more precise ▪ Cons ▪ It just predict the statistics of CPU cycle & memory access, not the most real metric: CPU usage ▪ Disable the link-time-optimizations during compiling make it incorrect ▪ Intel Pin is a proprietary software and can only run on IA32/IA64 32 Pros & Cons