information • Intensive use of obfuscation and antidebugging techniques: CFG flattening, virtual machines, dead code insertion • Complexity of gathering of “meta information” of different kinds (exception handlers) • Semantic difficulty of particular assembler instructions (especially considering x86 instruction set): XLAT, DIV, CMPXCHG and so on
• Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis
• Dynamic analysis • Analysis of one execution trace per every program run • Combined analysis Analysis techniques: • Symbolic execution • As a general rule, used in static analysis • Analysis of marked data (taint analysis) • As a general rule, used in dynamic analysis • Fuzzing • Expected input data is replaced by randomly generated bytes • And many others Methods of binary analysis In practice, analysis tools use a mixture of different techniques because each instrumentation method has its own restrictions. Consistent use of various approaches helps to partially (and sometimes totally) overcome their limitations.
information: process memory map, addresses of indirect calls and so on • Program execution may require specific environment • It’s not always possible to reproduce the results of previously ran analysis Static analysis: • In common, works faster • One analysis run is potentially able to cover infinite number of execution paths • Able to work in the case of absence of some parts of code\libraries • Unable to cope with obfuscation and encryption
function arguments) by symbolic values • Analysis tool operates on symbolic expressions instead of their concrete counterpart • Symbolic execution is able to cover all execution paths on single run • Every execution path represents a “state” of program, which holds all the constraints on crafted symbolic variables (path and value constraints) • SMT-solver – tool, designed to resolve constraints on symbolic variables Techniques – symbolic execution
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Let’s find the values of x and y, which will force execution flow to reach the label ERROR Taken from http://www.srl.inf.ethz.ch/pa2015/Lecture8.pdf
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Path constraints: True
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: True
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 != 2y0 Create two different states after conditional branch - if (x==z)
test(int x, int y) { z = twice(y); if (x == z) { if (x > y + 10) ERROR; } } int main() { x = read(); y = read(); test(x,y); } Symbolic execution: example Value constraints: X->x0 Y->y0 Z->2*y0 Path constraints: x0 = 2y0 ^ x0 > y0+10 Reachability condition of label ERROR:
the 70s of the last century): • Input values of tested program are marked as symbolic • In case of program error\vulnerability at some point in the executable we resolve the reachability constraints to this point • Found conditions on input data form the required test • This scheme is widely used in different verification systems Symbolic execution: applications
Set of IR instructions Pool of states (one state per execution path) State №1 State №2 State №… State №500 Each state holds the following data: • Current IP (instruction pointer) • Symbolic context (registers, memory cells) • Constraints Executor (director) – processes particular state X86: mov eax, ecx ___________________ IR: STR R_ECX:32, , V_00:32 STR V_00:32, , R_EAX:32 Interpreter – contains handlers for each IR instruction translation Conditional branch with condition X If some “interesting” point is reached, check its reachability: extract path constraints from state and solve corresponding smt task SMT-Solvers: Z3, STP, Boolector New state a: Constraints += X New state b: Constraints += ~X Add new states into the pool Searcher selects state from the pool
generate smaller number of states?) • cycle-unrolling (how to process cycles, the exit condition of which depend on symbolic variable?) • symbolic pointers (how to handle load and store operations, the address of which are also symbolic?) • constraint difficulty (some generated constraints are too difficult for all SMT-solvers to evaluate and find exact solutions) • external resources (how to process file handlers and other references to external objects?)
(unite) several states into bigger one (but when and how?) • path explosion – simultaneous processing of various states (parallel symbolic execution). • cycle unrolling, symbolic pointers – use specific SMT-logics (how effective is it?) • external resources – use DSL in order to describe external calls in terms of solver‘s expressions
data processed by program in the course of execution • Answers the question of how the program processes certain pieces of input data Analysis of marked data (taint analysis)
possibility of control flow hijacking (for example, as a result of stack\heap overflow). • Tainted arguments of particular functions (printf family of functions, system) suggest the possibility of a vulnerability. • Tainted resources (handlers, mutexes, that do not depend directly on use input) suggest the possibility of logical error in the program. Drawbacks: • Requires detailed analysis of every assembler instruction, which may be tedious for some architectures (x86) • Ideal taints analysis should instrument the entire code executed by the operating system (both in user-mode and in kernel-mode) which is not always possible.
• Create snapshots of the entire process on chosen control points • Instrument concrete execution trace and at the same time fill the queue of symbolic constraints: for each conditional branch on the trace push its constraints into the queue • Roll back to the previous control point, select new symbolic condition from the queue, solve its corresponding SMT-task and substitute found solution into exact execution context (registers and memory areas) • Instrument new trace with new parameters
converts asm instructions into solver’s (Z3) expressions (bypassing internal representation) Other: FuzzBall, BitBlaze • There are no tools of proper product quality • Every tool is focused on solving one particular task
and generate exploits • Able to work with symbolic pointers • Winner of DARPA contest in 2016 CodeSurfer, VeraCode • A little is known about their inner structure
of careful research • At present, there doesn’t exist a universal instrument of binary analysis • Every tool is focused on solving one particular task • Positive Technologies is working on its own tool – STAY TUNED!