Slide 1

Slide 1 text

Communications and Networking Lab, NTHU Performance Contracts for Software Network Functions 1 Rishabh Iyer, Luis Pedrosa, Arseniy Zaostrovnykh, Solal Pirelli, Katerina Argyraki, and George Candea NSDI 2019 Speaker: Chun-Fu Kuo Date:2021.02.18

Slide 2

Slide 2 text

Communications and Networking Lab, NTHU ■ Introduction ■ Problem Formulation ■ Proposed Method ■ Evaluation ■ Limitations ■ Conclusion ■ Pros and Cons 
 2 Outline

Slide 3

Slide 3 text

Communications and Networking Lab, NTHU ■ The LLVM Project is a collection of modular and reusable compiler and toolchain technologies ■ Famous front-end: Clang (for C, C++, Object-C, Object-C++) 3 Introduction LLVM

Slide 4

Slide 4 text

Communications and Networking Lab, NTHU ■ SMT problem is a decision problem for logical formulas ■ Examples of theories typically used in computer science ■ The theories of real numbers, lists, arrays, bit vector and so on ■ For instance: ■ ■ ■ Famous solvers: ■ Z3 (Microsoft, open source) ■ STP (Simple Theorem Prover) 3x + 2y − z ≥ 4 f( f(u, v), v) = f(u, v) 4 Introduction Satisfiability Modulo Theories (SMT)

Slide 5

Slide 5 text

Communications and Networking Lab, NTHU ■ A cross-platform SMT solver by Microsoft ■ Usage: 5 Introduction Z3 { a + b = 20 a + 2b = 10 ¬(a ∧ b ) ≡ (¬ a ∨ ¬ b)

Slide 6

Slide 6 text

Communications and Networking Lab, NTHU ■ Use SMT to prove the correctness of programs (software testing) ■ A.k.a symbolic evaluation or symbex 6 Introduction Symbolic Execution (SE) Input:

Slide 7

Slide 7 text

Communications and Networking Lab, NTHU ■ Example of KLEE (symbolic execution engine) which analyzes LLVM bitcode 7 Introduction Symbolic Execution (SE) int get_sign( int x ) { if (x == 0) return 0; if (x < 0) return -1; else return 1; } int main() { int a ; klee_make_symbolic(
 &a, sizeof(a), "a " ) ; return get_sign(a) ; KLEE: output directory = "klee-out-0" KLEE: done: total instructions = 3 1 KLEE: done: completed paths = 3 KLEE: done: generated tests = 3 Symbolic Execution Engine object 0: int : 0 object 0: int : 16843009 object 0: int : -2147483648

Slide 8

Slide 8 text

Communications and Networking Lab, NTHU ■ Instrumentation is performed at run time on the compiled binary files ■ Use JIT engine to ■ Analyze and label executable ■ Insert customized code at runtime ■ Famous tools: ■ Intel Pin (support IA32, IA64 only, easy to use) ■ DynamoRIO (support IA-32/AMD64/ARM/AArch64, complicated) ■ Frida (mostly for Android hacking) 8 Introduction Dynamic Binary Instrumentation (DBI)

Slide 9

Slide 9 text

Communications and Networking Lab, NTHU ■ Estimate the cost for deployment ■ Estimate the performance after change the configuration ■ Estimate the risk when suffer adversarial workload 9 Problem Formulation

Slide 10

Slide 10 text

Communications and Networking Lab, NTHU ■ Bolt ■ Use performance contract to predict the NF performance ■ Performance Critical Variables (PCV) depicts the contract ■ Use symbolic execution to find out all potential paths in NF ■ A bunch of pre-analysis library of stateful NF data structure ■ Bolt Distiller ■ Find out which execution paths are common in real world 10 System Model

Slide 11

Slide 11 text

Communications and Networking Lab, NTHU ■ A LPM router ■ PCV is ■ The length of IP address ■ (This example ignores all layers 
 below the NF code) l 11 Proposed Method - Performance Contract for LPM Router Longest Prefix Matching e.g. 140.114.111.222 / 16 NIC port

Slide 12

Slide 12 text

Communications and Networking Lab, NTHU 12 Proposed Method - Performance Contract for MAC Bridge learn source MAC 
 from which port find the matching 
 out port

Slide 13

Slide 13 text

Communications and Networking Lab, NTHU 13 Proposed Method - Performance Contract for MAC Bridge Drop(p) return MACtable_put() FORWARD() MACtable_get() BROADCAST() key present C: number of hash collision

Slide 14

Slide 14 text

Communications and Networking Lab, NTHU 14 Proposed Method - Contract Analysis Prototype

Slide 15

Slide 15 text

Communications and Networking Lab, NTHU 15 Proposed Method - Requirement of Generating Contract ■ Requirements ■ Well defined separation between stateful and stateless NF code ■ Pre-analysis library for stateful NF data structure ■ Analyze once, reuse across NFs ■ Appropriate PCV is the balance of precision & difficulty of use ■ More PCVs could leak the implementation detail of the NF ■ Developers need more detail ■ Operators need a easy analysis approach

Slide 16

Slide 16 text

Communications and Networking Lab, NTHU 16 Proposed Method - Generating Contract

Slide 17

Slide 17 text

Communications and Networking Lab, NTHU 17 Proposed Method - Generating Contract

Slide 18

Slide 18 text

Communications and Networking Lab, NTHU 18 Proposed Method - Generating Contract

Slide 19

Slide 19 text

Communications and Networking Lab, NTHU 19 Proposed Method - Constraints for NF Chains

Slide 20

Slide 20 text

Communications and Networking Lab, NTHU 20 Proposed Method - Constraints for NF Chains ■ Scenario ■ Firewall drop all packets with IP options ■ Router no longer receives the packets with IP options ■ How to improve the contract correctness? ■ Generate performance contracts for individual NFs in chain ■ Pair together traffic classes from communicating NFs ■ For each pair - AND respective constraints together

Slide 21

Slide 21 text

Communications and Networking Lab, NTHU 21 Proposed Method - Implementation Details ■ Instruction replay ■ Use instrumentation to log instructions & memory locations (access) ■ Disable any link-time-optimizations ■ Make BOLT always gets the worst performance result ■ Hardware model employed ■ Compute instructions: Follow Intel manual & adopt the worst case performance (due to out-of-order instruction scheduling) ■ Memory instructions: only model the private L1 Data Caches 
 (never model the proprietary features: prefetching, parallelism)

Slide 22

Slide 22 text

Communications and Networking Lab, NTHU 22 Proposed Method - Implementation Details

Slide 23

Slide 23 text

Communications and Networking Lab, NTHU 23 Proposed Method - Bolt Distiller Why? ■ There are several hundred execution paths for each NF ■ But only some of them are usually triggered in real world How? ■ Input: 1. The real-world traffic (PCAP file) 2. Stateless NF code, slightly modified version of data structure 
 (trace the # of loop iterations, logging the matched prefix length) Result ■ The match length of LPM router mostly are 16~24 bits

Slide 24

Slide 24 text

Communications and Networking Lab, NTHU 24 Proposed Method - Bolt Distiller ■ For operator ■ Distiller can be used to balance risk with resource utilization ■ For developer ■ Distiller can help them know which assumption is wrong, so they can optimize the code 📝 Note: Distiller doesn’t change the contract, only tells the user which execution paths are more common

Slide 25

Slide 25 text

Communications and Networking Lab, NTHU 25 Evaluation ■ Environment ■ CPU: E5-2667 ■ RAM: 32 GB ■ NIC: Intel 82599ES 10 Gb (directly connected) ■ Traffic generator: MoonGen ■ NFs ■ Br:MAC bridge ■ LPM:LPM router (use DPDK data structure) ■ NAT:a formally verified NAT ■ LB:Maglev like load balancer

Slide 26

Slide 26 text

Communications and Networking Lab, NTHU 26 Evaluation ■ Bolt generate contracts from many possible execution path ■ Metrics: ■ # of executed instruction ■ # of memory accesses ■ # of execution cycles

Slide 27

Slide 27 text

Communications and Networking Lab, NTHU 27 Evaluation - Hardware-Independent Metrics ■ Over-estimation is 7.5% ■ Coalesce execution paths within the stateful performance contract ■ Small differences between the analyzed code & production build

Slide 28

Slide 28 text

Communications and Networking Lab, NTHU 28 Evaluation - Hardware-Dependent Metrics ■ 4.08X for typical workload, 9.26X for the pathological (unconstrained) ■ It can be improved via better hardware model

Slide 29

Slide 29 text

Communications and Networking Lab, NTHU 29 Evaluation - Hardware-Dependent Metrics ■ To validate the hardware model hypothesis, here is a simple experiment ■ P1: traverse a non-contiguously allocated linked list ■ ❌ MLP (Memory Level Parallelism), ❌ prefetching ■ Error in 5% ■ P2: traverse a linked list in a contiguous chunk of memory ■ ❌ MLP (Memory Level Parallelism), ✅ prefetching ■ Error is 6X ■ P3: traverse a array ■ ✅ MLP (Memory Level Parallelism), ✅ prefetching ■ Error is 9X

Slide 30

Slide 30 text

Communications and Networking Lab, NTHU 30 Limitations ■ Requirements: ■ Separation on stateful/stateless code ■ Pre-analysis library ■ Doesn’t support multi-threaded NFs with shared state ■ Doesn’t consider the contention about cache & memory ■ NF performs system call or share CPU core cannot be analyzed accurately ■ Need the source code of NF to analyze 
 (more convenient for PCV selection)

Slide 31

Slide 31 text

Communications and Networking Lab, NTHU ■ Goal ■ Predict the performance of NF without executing it ■ Method ■ Use performance critical variables to describe the NF ■ Symbolic execution can help find all the potential paths ■ Use Bolt Distiller to know which paths is much more common in real world ■ Result ■ Predict the NF performance with error rate < 8% 31 Conclusion

Slide 32

Slide 32 text

Communications and Networking Lab, NTHU ■ Pros ■ PCVs are flexible for analyzing NF performance ■ Pre-analysis library can make prediction more precise ■ Cons ■ It just predict the statistics of CPU cycle & memory access, not the most real metric: CPU usage ■ Disable the link-time-optimizations during compiling make it incorrect ■ Intel Pin is a proprietary software and can only run on IA32/IA64 32 Pros & Cons