a collection of modular and reusable compiler and toolchain technologies ▪ Famous front-end: Clang (for C, C++, Object-C, Object-C++) 3 Introduction LLVM
decision problem for logical formulas ▪ Examples of theories typically used in computer science ▪ The theories of real numbers, lists, arrays, bit vector and so on ▪ For instance: ▪ ▪ ▪ Famous solvers: ▪ Z3 (Microsoft, open source) ▪ STP (Simple Theorem Prover) 3x + 2y − z ≥ 4 f( f(u, v), v) = f(u, v) 4 Introduction Satisfiability Modulo Theories (SMT)
run time on the compiled binary files ▪ Use JIT engine to ▪ Analyze and label executable ▪ Insert customized code at runtime ▪ Famous tools: ▪ Intel Pin (support IA32, IA64 only, easy to use) ▪ DynamoRIO (support IA-32/AMD64/ARM/AArch64, complicated) ▪ Frida (mostly for Android hacking) 8 Introduction Dynamic Binary Instrumentation (DBI)
contract to predict the NF performance ▪ Performance Critical Variables (PCV) depicts the contract ▪ Use symbolic execution to find out all potential paths in NF ▪ A bunch of pre-analysis library of stateful NF data structure ▪ Bolt Distiller ▪ Find out which execution paths are common in real world 10 System Model
PCV is ▪ The length of IP address ▪ (This example ignores all layers below the NF code) l 11 Proposed Method - Performance Contract for LPM Router Longest Prefix Matching e.g. 140.114.111.222 / 16 NIC port
of Generating Contract ▪ Requirements ▪ Well defined separation between stateful and stateless NF code ▪ Pre-analysis library for stateful NF data structure ▪ Analyze once, reuse across NFs ▪ Appropriate PCV is the balance of precision & difficulty of use ▪ More PCVs could leak the implementation detail of the NF ▪ Developers need more detail ▪ Operators need a easy analysis approach
for NF Chains ▪ Scenario ▪ Firewall drop all packets with IP options ▪ Router no longer receives the packets with IP options ▪ How to improve the contract correctness? ▪ Generate performance contracts for individual NFs in chain ▪ Pair together traffic classes from communicating NFs ▪ For each pair - AND respective constraints together
Details ▪ Instruction replay ▪ Use instrumentation to log instructions & memory locations (access) ▪ Disable any link-time-optimizations ▪ Make BOLT always gets the worst performance result ▪ Hardware model employed ▪ Compute instructions: Follow Intel manual & adopt the worst case performance (due to out-of-order instruction scheduling) ▪ Memory instructions: only model the private L1 Data Caches (never model the proprietary features: prefetching, parallelism)
Distiller Why? ▪ There are several hundred execution paths for each NF ▪ But only some of them are usually triggered in real world How? ▪ Input: 1. The real-world traffic (PCAP file) 2. Stateless NF code, slightly modified version of data structure (trace the # of loop iterations, logging the matched prefix length) Result ▪ The match length of LPM router mostly are 16~24 bits
Distiller ▪ For operator ▪ Distiller can be used to balance risk with resource utilization ▪ For developer ▪ Distiller can help them know which assumption is wrong, so they can optimize the code 📝 Note: Distiller doesn’t change the contract, only tells the user which execution paths are more common
▪ Over-estimation is 7.5% ▪ Coalesce execution paths within the stateful performance contract ▪ Small differences between the analyzed code & production build
▪ To validate the hardware model hypothesis, here is a simple experiment ▪ P1: traverse a non-contiguously allocated linked list ▪ ❌ MLP (Memory Level Parallelism), ❌ prefetching ▪ Error in 5% ▪ P2: traverse a linked list in a contiguous chunk of memory ▪ ❌ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 6X ▪ P3: traverse a array ▪ ✅ MLP (Memory Level Parallelism), ✅ prefetching ▪ Error is 9X
Separation on stateful/stateless code ▪ Pre-analysis library ▪ Doesn’t support multi-threaded NFs with shared state ▪ Doesn’t consider the contention about cache & memory ▪ NF performs system call or share CPU core cannot be analyzed accurately ▪ Need the source code of NF to analyze (more convenient for PCV selection)
performance of NF without executing it ▪ Method ▪ Use performance critical variables to describe the NF ▪ Symbolic execution can help find all the potential paths ▪ Use Bolt Distiller to know which paths is much more common in real world ▪ Result ▪ Predict the NF performance with error rate < 8% 31 Conclusion
flexible for analyzing NF performance ▪ Pre-analysis library can make prediction more precise ▪ Cons ▪ It just predict the statistics of CPU cycle & memory access, not the most real metric: CPU usage ▪ Disable the link-time-optimizations during compiling make it incorrect ▪ Intel Pin is a proprietary software and can only run on IA32/IA64 32 Pros & Cons