Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Performance Contracts for Software Network Fu...

JackKuo
February 18, 2021

Performance Contracts for Software Network Functions

Group meeting presentation of CANLAB in NTHU

JackKuo

February 18, 2021
Tweet

More Decks by JackKuo

Other Decks in Education

Transcript

  1. Communications and Networking Lab, NTHU Performance Contracts for Software Network

    Functions 1 Rishabh Iyer, Luis Pedrosa, Arseniy Zaostrovnykh, Solal Pirelli, Katerina Argyraki, and George Candea NSDI 2019 Speaker: Chun-Fu Kuo Date:2021.02.18
  2. Communications and Networking Lab, NTHU ▪ Introduction ▪ Problem Formulation

    ▪ Proposed Method ▪ Evaluation ▪ Limitations ▪ Conclusion ▪ Pros and Cons 
 2 Outline
  3. Communications and Networking Lab, NTHU ▪ The LLVM Project is

    a collection of modular and reusable compiler and toolchain technologies ▪ Famous front-end: Clang (for C, C++, Object-C, Object-C++) 3 Introduction LLVM
  4. Communications and Networking Lab, NTHU ▪ SMT problem is a

    decision problem for logical formulas ▪ Examples of theories typically used in computer science ▪ The theories of real numbers, lists, arrays, bit vector and so on ▪ For instance: ▪ ▪ ▪ Famous solvers: ▪ Z3 (Microsoft, open source) ▪ STP (Simple Theorem Prover) 3x + 2y − z ≥ 4 f( f(u, v), v) = f(u, v) 4 Introduction Satisfiability Modulo Theories (SMT)
  5. Communications and Networking Lab, NTHU ▪ A cross-platform SMT solver

    by Microsoft ▪ Usage: 5 Introduction Z3 { a + b = 20 a + 2b = 10 ¬(a ∧ b ) ≡ (¬ a ∨ ¬ b)
  6. Communications and Networking Lab, NTHU ▪ Use SMT to prove

    the correctness of programs (software testing) ▪ A.k.a symbolic evaluation or symbex 6 Introduction Symbolic Execution (SE) Input:
  7. Communications and Networking Lab, NTHU ▪ Example of KLEE (symbolic

    execution engine) which analyzes LLVM bitcode 7 Introduction Symbolic Execution (SE) int get_sign( int x ) { if (x == 0) return 0; if (x < 0) return -1; else return 1; } int main() { int a ; klee_make_symbolic(
 &a, sizeof(a), "a " ) ; return get_sign(a) ; KLEE: output directory = "klee-out-0" KLEE: done: total instructions = 3 1 KLEE: done: completed paths = 3 KLEE: done: generated tests = 3 Symbolic Execution Engine object 0: int : 0 object 0: int : 16843009 object 0: int : -2147483648
  8. Communications and Networking Lab, NTHU ▪ Instrumentation is performed at

    run time on the compiled binary files ▪ Use JIT engine to ▪ Analyze and label executable ▪ Insert customized code at runtime ▪ Famous tools: ▪ Intel Pin (support IA32, IA64 only, easy to use) ▪ DynamoRIO (support IA-32/AMD64/ARM/AArch64, complicated) ▪ Frida (mostly for Android hacking) 8 Introduction Dynamic Binary Instrumentation (DBI)
  9. Communications and Networking Lab, NTHU ▪ Estimate the cost for

    deployment ▪ Estimate the performance after change the configuration ▪ Estimate the risk when suffer adversarial workload 9 Problem Formulation
  10. Communications and Networking Lab, NTHU ▪ Bolt ▪ Use performance

    contract to predict the NF performance ▪ Performance Critical Variables (PCV) depicts the contract ▪ Use symbolic execution to find out all potential paths in NF ▪ A bunch of pre-analysis library of stateful NF data structure ▪ Bolt Distiller ▪ Find out which execution paths are common in real world 10 System Model
  11. Communications and Networking Lab, NTHU ▪ A LPM router ▪

    PCV is ▪ The length of IP address ▪ (This example ignores all layers 
 below the NF code) l 11 Proposed Method - Performance Contract for LPM Router Longest Prefix Matching e.g. 140.114.111.222 / 16 NIC port
  12. Communications and Networking Lab, NTHU 12 Proposed Method - Performance

    Contract for MAC Bridge learn source MAC 
 from which port find the matching 
 out port
  13. Communications and Networking Lab, NTHU 13 Proposed Method - Performance

    Contract for MAC Bridge Drop(p) return MACtable_put() FORWARD() MACtable_get() BROADCAST() key present C: number of hash collision
  14. Communications and Networking Lab, NTHU 15 Proposed Method - Requirement

    of Generating Contract ▪ Requirements ▪ Well defined separation between stateful and stateless NF code ▪ Pre-analysis library for stateful NF data structure ▪ Analyze once, reuse across NFs ▪ Appropriate PCV is the balance of precision & difficulty of use ▪ More PCVs could leak the implementation detail of the NF ▪ Developers need more detail ▪ Operators need a easy analysis approach
  15. Communications and Networking Lab, NTHU 20 Proposed Method - Constraints

    for NF Chains ▪ Scenario ▪ Firewall drop all packets with IP options ▪ Router no longer receives the packets with IP options ▪ How to improve the contract correctness? ▪ Generate performance contracts for individual NFs in chain ▪ Pair together traffic classes from communicating NFs ▪ For each pair - AND respective constraints together
  16. Communications and Networking Lab, NTHU 21 Proposed Method - Implementation

    Details ▪ Instruction replay ▪ Use instrumentation to log instructions & memory locations (access) ▪ Disable any link-time-optimizations ▪ Make BOLT always gets the worst performance result ▪ Hardware model employed ▪ Compute instructions: Follow Intel manual & adopt the worst case performance (due to out-of-order instruction scheduling) ▪ Memory instructions: only model the private L1 Data Caches 
 (never model the proprietary features: prefetching, parallelism)
  17. Communications and Networking Lab, NTHU 23 Proposed Method - Bolt

    Distiller Why? ▪ There are several hundred execution paths for each NF ▪ But only some of them are usually triggered in real world How? ▪ Input: 1. The real-world traffic (PCAP file) 2. Stateless NF code, slightly modified version of data structure 
 (trace the # of loop iterations, logging the matched prefix length) Result ▪ The match length of LPM router mostly are 16~24 bits
  18. Communications and Networking Lab, NTHU 24 Proposed Method - Bolt

    Distiller ▪ For operator ▪ Distiller can be used to balance risk with resource utilization ▪ For developer ▪ Distiller can help them know which assumption is wrong, so they can optimize the code 📝 Note: Distiller doesn’t change the contract, only tells the user which execution paths are more common
  19. Communications and Networking Lab, NTHU 25 Evaluation ▪ Environment ▪

    CPU: E5-2667 ▪ RAM: 32 GB ▪ NIC: Intel 82599ES 10 Gb (directly connected) ▪ Traffic generator: MoonGen ▪ NFs ▪ Br:MAC bridge ▪ LPM:LPM router (use DPDK data structure) ▪ NAT:a formally verified NAT ▪ LB:Maglev like load balancer
  20. Communications and Networking Lab, NTHU 26 Evaluation ▪ Bolt generate

    contracts from many possible execution path ▪ Metrics: ▪ # of executed instruction ▪ # of memory accesses ▪ # of execution cycles
  21. Communications and Networking Lab, NTHU 27 Evaluation - Hardware-Independent Metrics

    ▪ Over-estimation is 7.5% ▪ Coalesce execution paths within the stateful performance contract ▪ Small differences between the analyzed code & production build
  22. Communications and Networking Lab, NTHU 28 Evaluation - Hardware-Dependent Metrics

    ▪ 4.08X for typical workload, 9.26X for the pathological (unconstrained) ▪ It can be improved via better hardware model
  23. Communications and Networking Lab, NTHU 29 Evaluation - Hardware-Dependent Metrics

    ▪ To validate the hardware model hypothesis, here is a simple experiment ▪ P1: traverse a non-contiguously allocated linked list ▪ ❌ MLP (Memory Level Parallelism), ❌ prefetching ▪ Error in 5% ▪ P2: traverse a linked list in a contiguous chunk of memory ▪ ❌ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 6X ▪ P3: traverse a array ▪ ✅ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 9X
  24. Communications and Networking Lab, NTHU 30 Limitations ▪ Requirements: ▪

    Separation on stateful/stateless code ▪ Pre-analysis library ▪ Doesn’t support multi-threaded NFs with shared state ▪ Doesn’t consider the contention about cache & memory ▪ NF performs system call or share CPU core cannot be analyzed accurately ▪ Need the source code of NF to analyze 
 (more convenient for PCV selection)
  25. Communications and Networking Lab, NTHU ▪ Goal ▪ Predict the

    performance of NF without executing it ▪ Method ▪ Use performance critical variables to describe the NF ▪ Symbolic execution can help find all the potential paths ▪ Use Bolt Distiller to know which paths is much more common in real world ▪ Result ▪ Predict the NF performance with error rate < 8% 31 Conclusion
  26. Communications and Networking Lab, NTHU ▪ Pros ▪ PCVs are

    flexible for analyzing NF performance ▪ Pre-analysis library can make prediction more precise ▪ Cons ▪ It just predict the statistics of CPU cycle & memory access, not the most real metric: CPU usage ▪ Disable the link-time-optimizations during compiling make it incorrect ▪ Intel Pin is a proprietary software and can only run on IA32/IA64 32 Pros & Cons